ow 
Oe 
~ 

. 


— 


a 


i. 
* 


ENZYME TRAFFICKING «= 


§ # Inthe Golgi, LYSET ensures targeting ofhewly « © ‘ 


A synthesized lysosonigiggzymes pp.39 & 
[ss ’ ss 


Presented by Bll & Science 


Bioinnovation Science www.bii.dk/scienceprize 


_ BIl| Prize for 
Science | Innovation 


Bringing research , 
to life-—and science 
to market 


Behind every life-changing solution is an 
entrepreneurial scientist—a creative mind 
who proved an idea in the lab and dared 
to carry it out in the world. 


To encourage more scientists to translate their 
research, Biolnnovation Institute (BII) and 
Science present a new annual award. 


Our three winners will have their essays published 
in Science magazine and will be invited into 
BIl’s entrepreneurial ecosystem. In addition, 
the Grand Prize winner will receive a prize 
of USD 25,000, and each runner-up 

will receive USD 10,000 at a grand award 
ceremony in Copenhagen, Denmark. 


The call for applications has just opened. 
Apply before November 1, 2022. 


www.bii.dk/scienceprize 


See film from the 
grand award 
ceremony 2022 


Apply before November 1, 2022 
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EDITORIAL 


Protect wildlife from livestock diseases 


his summer, seabirds in Europe, North America, 
and Africa suffered unprecedented high mortal- 
ity as highly pathogenic avian influenza (HPAI) 
swept through their breeding colonies. Given 
that the potential for HPAI—which originated 
in farmed poultry—to affect wild birds has been 
known for more than a decade, how were these 
continents caught off guard? Nations must assume 
responsibility for protecting wildlife from anthropo- 
genic diseases, particularly those originating from ever- 
increasing livestock populations. 

HPAI typically emerges in commercial poultry farms 
by conversion of an innocuous wild-type virus—aptly 
called low pathogenic avian influenza 
virus—into one that causes high mor- 
tality in poultry. The current HPAI vi- 
rus originated in a commercial goose 
farm in China in 1996 and spread 
across the rapidly growing poultry 
populations in Asia, eventually sub- 
stantially spilling over into wild birds 
in 2005. The virus caused numerous 
wild bird outbreaks in Asia and Eu- 
rope, typically during autumn and 
winter, but has persisted year-round in 
wild birds in Europe since 2021. This 
year, it has spread quickly to breeding 
seabirds, including in Canada, France, 
Norway, South Africa, and the UK. 
For many of these long-lived species, 
already threatened by loss of habi- 
tat and climate change, the resulting 
mortality will have a large impact on their populations. 

The burgeoning worldwide production and trade of 
farmed animals poses an increasing threat of infectious 
diseases for wildlife. Globally, over the past 50 years, 
the population of poultry has grown 6.1-fold, from 5.71 
to 35.07 billion; of pigs, 1.7-fold from 54717 to 952.63 
million; and of cattle, 1.4-fold from 1.08 to 1.53 billion. 
These large livestock populations, which are connected 
through trade, form reservoirs where infectious dis- 
eases can evolve and spill over into wildlife, occasionally 
with devastating consequences. In 2016-2017, peste-des- 
petits-ruminants virus spread from livestock to saiga 
antelope, killing ~80% of this critically endangered spe- 
cies in Mongolia. Since 2007, African swine fever virus 
has spread across Europe and Asia through trade of pigs 
and pigmeat products, spilling over into wild boar and 
threatening endangered species of wild suids in South- 
east Asia. Other spillovers include Mycoplasma gallisep- 
ticum bacteria from poultry to house finches and other 


“Nations 
must assume 
responsibility 


for protecting 
wildlife from 

anthropogenic 
diseases...” 


songbirds in North America, and Mycobacterium bovis 
bacteria from cattle to wild mammals worldwide. 

Livestock diseases are seen mainly as an economic 
problem for the agricultural sector (as well as a concern 
for human health if they can potentially pass from ani- 
mals to humans), and are managed as such by nations. 
However, given the high frequency with which these 
diseases spill over into wildlife, and their potential im- 
pact, they are clearly a major threat to the conserva- 
tion of biodiversity. This pressure comes on top of the 
stresses of habitat degradation, pollution, and climate 
change on wildlife. 

The unprecedented crisis of HPAI is a salient re- 
minder that prevention is critical. 
More consideration must be given 
to the risk of future spillovers from 
livestock to wildlife in proposed fun- 
damental reorganization of the food 
sector. Government departments re- 
sponsible for wildlife protection must 
develop policies that prevent such 
spillovers and, in case this fails, have 
multi-agency and multi-stakeholder 
plans and mitigation strategies to 
control disease spread. Transmissions 
of concern include goatpox from do- 
mestic goats to wild ruminants, swine 
influenza from domestic pigs to wild 
mammals, and Newcastle disease 
from poultry to wild birds. 

Preventive actions include reducing 
livestock herd sizes and densities of 
farms, limiting the transport of livestock among farms, 
and restricting contacts between farmed animals and re- 
lated wild species. In middle- and high-income countries, 
these efforts must be complemented by a transition from 
animal- to plant-based proteins in the human diet so that 
reduced livestock production is mirrored by an equiva- 
lent reduction in demand for meat, dairy, and eggs. 

For HPAI, this translates in the short term into accu- 
rate monitoring of the virus and associated mortality in 
wild birds, and, where appropriate, coordinated removal 
of infected wild bird carcasses from affected sites to limit 
virus spread. Long-term recommendations include en- 
hanced protection of seabird and waterbird sites, vacci- 
nation of poultry against HPAI, reduction of poultry farm 
size and density, and avoidance of waterbird-rich areas as 
a location for poultry farms. 

The HPAI outbreak in seabirds is a warning, with dev- 
astating consequences if not heeded. 

-Thijs Kuiken and Ruth Cromie 
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The Novo Nordisk Foundation and the Biolnnovation Institute Foundation (BI!) are 
digging into the basic biology of metabolic diseases in search of new treatments. 


Chemical reactions determine how a person's body uses energy. “Cardiometabolic 
disease, which | define as a continuum, starts with childhood obesity, type 2 
diabetes, cardiovascular disease, and so on," says Mads Krogsgaard Thomsen, 
CEO of the Novo Nordisk Foundation in Hellerup, Denmark. “That whole metabolic- 
disease spectrum is becoming the number one killer globally speaking, even 
outcompeting oncology, and we feel that it should be our duty to heavily fund 
research in that area.’ As one effort in that direction, the foundation created its 
Center for Basic Metabolic Research. “There, we're focusing on getting to the 
root of obesity and type 2 diabetes,” Thomsen says. The foundation also supports 
Copenhagen-based BIl, which funds a range of basic and clinical research on 
metabolic diseases. 

Ongoing work on human metabolic processes is revealing a broader impact 
of the chemical imbalances caused by metabolic diseases. "We're seeing that 
the metabolism impacts not only the traditional metabolic diseases, but also 


neurological disorders such as Alzheimer's, Parkinson's, and Huntington's diseases,” 


says BII CEO Jens Nielsen. “People are even beginning to talk about Alzheimer's as 
type 3 diabetes.” 
The range of diseases related to metabolic conditions probably stretches even 


farther. As scientists delve even more deeply into the details of metabolic processes, 


Nielsen believes that other connections to diseases could be discovered. 


Exploring new targets for treating obesity 

As part of its commitment to bolster research that tackles metabolic diseases, Bll 
is supporting Copenhagen-based Ousia Pharma through its Venture Lab 
acceleration program. 


TTL 
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The Venture Lab program is designed to help early-stage startups that are 
committed to solving major global challenges related to human and planetary 
health. The program focuses on business acceleration, scientific development, and 
team development, and includes a €500,000 (USD 500,400) risk-free, convertible 
loan. During the program, the startups conduct experiments to reach initial proof 
of concept, and they receive help making a business plan and setting up a team to 
progress rapidly towards the market. 

Spun out from Christoffer Clemmensen’s research at the Novo Nordisk Foundation 
Center for Basic Metabolic Research at the University of Copenhagen, Ousia is 
working on peptide-drug conjugates as treatments for obesity. 

“We're trying to come up with new modes of action compared to the currently 
available therapies,” says Clemmensen, CEO of Ousia Pharma and associate 
professor at the University of Copenhagen. “We've tried to come up with new ways 
of thinking about targeting, weight loss, and especially appetite regulation.” In 
that vein, Ousia is developing a peptide-drug conjugate that will deliver a small- 
molecule payload to the brain. “We have data to support that this drug lowers body 
weight via augmenting neuroplasticity,” Clemmensen says, referring to the ability of 
neurons and neural networks to make adaptive structural changes. 

Those structural changes in the brain include neurons in the appetite centers 
in the hypothalamus and some areas of the brainstem. “In the brain, neurons 
communicate through synapses,’ Clemmensen explains. “In many ways, that is a 
point of intervention that has been less explored when it comes to weight loss.” 

Although existing therapies might eventually modulate communication between 
neurons, the Ousia Pharma approach achieves this directly. In particular, the Ousia 
peptide-drug conjugate will target the glutamatergic neurotransmitter system, 
which was identified via genetic studies of obesity. As a modified endogenous 
peptide, Ousia's lead peptide, OP-56, “targets the areas that are important for 
appetite but doesn't elicit adverse effects typically associated with targeting 
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glutamate receptors,” Clemmensen 
says. Consequently, this peptide- 
directed targeting approach can 
deliver a drug to the desired areas 
in the brain without affecting 
others. 

OP-56 modulates glutaminergic 
receptors called NMDA (N-methyl- 
D-aspartate). “We're using OP-56 
as a kind of guide—basically a 
Trojan horse—to target neurons 
in appetite centers to deliver 
NMDA receptor modulators,” 
Clemmensen says. Those 
modulators will at first block NMDA 
signaling, which has previously 
been linked to appetite suppression. 

So far, work in rodents shows that Ousia’s peptide-drug conjugate is safe and 
effective. “We're excited about bridging the work from rodent to primate models 
to really put our clinical drug candidate to the test in a more relevant species,” 
Clemmensen says. 


Targeting thermogenic fat and the brain 

Another company through which BIl is supporting metabolism-related therapeutic 
endeavors is Copenhagen-based Embark Biotech, which was spun out from 
associate professor Zach Gerhart-Hines’ lab, also at the Novo Nordisk Foundation 
Center for Basic Metabolic Research. As Embark CEO Casper Tind Hansen notes, 
“We are engineering drugs that leverage the communication between our fat tissue 
and brain to both increase calorie-burning and decrease appetite.” Targeting the 
fat tissue-brain axis offers opportunities to address numerous indications ranging 
from rare orphan genetic obesities and hyperphagic (food-craving) disorders to the 
broader cardiometabolic disease space. 

“Humans possess several types of fat cells which can either store calories— 
leading to expanding waistlines and obesity—or burn calories, to improve 
cardiometabolic health,” says Embark CSO Gerhart-Hines. “These latter fat cells, 
called thermogenic fat, are evolutionarily designed to expend excess energy, and 
they further benefit us by removing extraordinary amounts of glucose, lipids, and 
other macronutrients from our blood to fuel this calorie-burning.” 

While exploring ways to take advantage of thermogenic fat tissue, Gerhart-Hines 
and Embark scientists discovered that several key regulators of thermogenic fat 
also control food intake through appetite centers in the brain. “By tapping into the 
integrated communication between the fat tissue and brain, we can target one 
receptor and boost calorie-burning while also reducing food craving,” Hansen says. 

In its most mature pipeline program, Embark is developing a therapeutic that 
targets receptors localized within one of the major feeding control centers, the 
hindbrain, and in thermogenic fat tissue. "We've been fortunate to generate a 
molecule that could be injected once weekly,” Hansen says. “In preclinical models, 


Left: Mads Krogsgaard Thomsen, CEO, Novo Nordisk Foundation. Right: Casper Tind Hansen, CEO, Embark Biotech. 
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we can almost completely 

shut off the need or desire for 
hyperpalatable foods (processed 
foods that are designed to 
maximize cravings) without 
triggering nausea.” 


Pursuing 
prevention 

Despite a number of exciting 
treatments on the horizon for 
metabolism-related conditions, 
this growing global health care 
crisis drives scientists to address 
these diseases at their earliest 
stages. 

“Obesity is an overlooked chronic disease, and now we're beginning to develop 
the tools to help patients combine therapy with behavioral modification,” Thomsen 
says. In addition, he hopes that scientists can find ways to eliminate this disease. As 
he says, “It's very important to work out how to prevent childhood obesity.” 

Today, more than 650 million people worldwide live with obesity, and by 2030 
that number could rise to 1 billion. There is a growing understanding that obesity is 
interlinked with other cardiometabolic diseases, such as type 2 diabetes and heart, 
kidney, and liver disease. 

“At present, health care systems are mostly focused on treating the symptoms 
of these chronic diseases, rather than their root causes,” says Karin Conde-Knape, 
senior vice president for global drug discovery at Novo Nordisk. “The starting 
point for cardiometabolic conditions is often overweight and obesity. If we are 
able to prevent obesity, we could also prevent a lot of other chronic diseases from 
developing down the line.” 

To pursue that goal, "Novo Nordisk engages with partners globally to address the 
root causes of disease and develop interventions to prevent the rise of metabolic 
disease like type 2 diabetes and obesity,’ Conde-Knape says. As a recent example, 
Novo Nordisk initiated a partnership with Bll to accelerate world-class innovation by 
enrolling two projects in BIl’s Venture Lab program. One of the projects focuses on 
enabling digital health technologies for patients living with binge-eating disorder, 
which is strongly associated with obesity. 

From basic biology and clinical research through prevention and treatments, the 
community of experts in Denmark are working together to better understand and 
treat a wide range of cardiometabolic diseases. The results promise to improve lives 
around the world. 


Sponsored by 


Biolnnovation 
Institute 


IN BRIEF 


Edited by 
Jeffrey Brainard 


Demonstrators took to the streets of Tehran, Iran, to protest the death of a woman in police custody. 


HUMAN RIGHTS 


lranian university students rise up to protest crackdown 


ran’s universities have become hotbeds of protests against 
the government—and violent crackdowns by police—in 
the wake of the death of a young woman detained by the 
country’s notorious morality police. The prestigious Sharif 
University of Technology in Tehran, for example—lauded 
as Iran’s Massachusetts Institute of Technology—erupted 
in protest on 2 October; eyewitness accounts describe profes- 
sors linking arms to form a human shield to protect student 
protesters from police, who ended up arresting about 30. 


White House offers Al guidance 


POLicy | Americans deserve to be 
protected against artificial intelligence 
algorithms that are discriminatory or 
violate their privacy, says an AI “bill of 
rights” rolled out this week by the White 
House. The 73-page document lays out five 
core principles the tech industry and pub- 
lic officials should follow when using or 
regulating AI. But it is silent on how those 
principles should be implemented and how 
they would be enforced, and senior admin- 
istration officials said there are no plans to 
turn them into specific legislation, as the 
European Union is contemplating. Marc 
Rotenberg, an AI ethics and justice advo- 
cate, calls the document “an important 
first step” toward a comprehensive U.S. 
policy on the use of this technology, which 
is increasingly used in law enforcement, 
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health care, education, and other sectors of 
society. But Ben Shneiderman, a computer 
scientist at the University of Maryland, 
College Park, says the policy doesn’t go far 
enough. “We need to move past should- 
ism,” he says, “and tell people what they 
need to do, and by when.” 


European telescope array debuts 


ASTRONOMY | The most powerful 
millimeter-wave radio telescope in the 
Northern Hemisphere has been com- 
pleted and was inaugurated last week. 
The Northern Extended Millimeter Array 
(NOEMA) is built on an existing set of 
six 15-meter dishes in the French Alps, to 
which six dishes have been added. The 
array, the world’s second largest after a 
giant telescope in Chile, can be reconfig- 
ured, spreading the dishes as far as 


Similar convulsions have occurred at more than 100 Iranian 
universities, part of what may be the biggest challenge by 
Iranians to the Islamic clerics’ 43-year rule. More than 

110 students had been detained as of 4 October; 1145 profes- 
sors and lecturers from across Iran signed a statement con- 
demning their arrests. The woman who died, Mahsa Amini, 
22, was arrested on 16 September for allegedly wearing her hi- 
jab improperly. She fell into a coma; police claim she suffered 
a heart attack but fellow detainees say she was beaten. 


1.7 kilometers apart to sharpen its images. 
NOEMA forms a part of the Event Horizon 
Telescope, a set of radio telescopes around 
the world that images supermassive black 
holes. It will also be used to study inter- 
stellar gases and the formation and dynam- 
ics of galaxies and stars. It is run by the 
French national research agency, CNRS; 
Germany’s Max Planck Society; and Spain’s 
National Geographic Institute. 


U.S. agency seeks diversity plans 


POLicy | Researchers applying for grants 
from the U.S. Department of Energy’s 
Office of Science—the country’s single big- 
gest funder of the physical sciences—must 
propose how the project will promote 
greater participation by researchers of 
color and other underrepresented groups, 
the office announced this week. These 
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disciplines are among the least diverse in 
science; Black people earned just 0.5% of 
Ph.D.s in physics awarded by U.S. institu- 
tions from 1999 to 2020. The application’s 
Promoting Inclusive and Equitable 
Research Plan must go beyond describ- 

ing the diversity efforts of the applicants’ 
institution, the agency says. For example, it 
could assign scientists from under- 
represented groups to project leader- 

ship roles and dedicate some of the grant 
money to train and mentor group mem- 
bers. The requirement applies to new 
grants and renewals and to large and small 
teams. The Office of Science has recently 
unveiled other programs to advance 
diversity, including Reaching a New Energy 
Sciences Workforce, with $22 million dedi- 
cated this year for research at institutions 
including historically Black ones. 


Solar, wind power make strides 


CLEAN ENERGY | Solar and wind power hit 
new milestones in 2021, jointly supplying 
for the first time more than 10% of all elec- 
tricity generated globally. The two sources 
also accounted for three-quarters of all new 
electric-generating capacity installed that 
year as their costs dropped, according to a 
21 September report from BloombergNEF, 

a research company. Advocates for reducing 
carbon emissions hailed the trends. Still, 
production of coal power also rose as eco- 
nomic recovery increased demand, drought 
cut hydropower, and natural gas prices rose, 
although the global increase was the small- 
est in 15 years. Worldwide, coal remained 
the largest single source of electricity in 
2021. Half of the countries that pledged to 
phase out coal power at last year’s Glasgow, 
Scotland, climate summit instead reported 
that it grew in 2021. 


Renewables stake their place 

Solar and wind are the fastest growing sources of 
energy. Zero-carbon sources, which include hydro 
and nuclear, reached nearly 40% of the total in 2021. 
Generation is shown in thousands of terawatt-hours. 
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Greek soldiers known 
as hoplites are depicted 
ona sixth centu 2 


Ancient Greeks may have used mercenaries 


hroniclers of ancient Greece helped create an enduring narrative that its victorious 
armies were composed of citizen-warriors. But new evidence suggests in at least 
one battle, they had help—from mercenaries recruited from far away in Europe 
and Asia. A research team identified foreign lineages through genetic analysis of 
fallen soldiers buried near Himera, a Greek colony in Sicily, after a winning battle 
in 480 B.C.E. against Carthaginian invaders. What's more, the chemical composition of 
their bones shows many didn’t grow up near Himera, and the warriors’ good health in life 
suggests they had not been enslaved, the team reports this week in the Proceedings of 
the National Academy of Sciences. The powerful melding of bioanthropological evidence 
with historical accounts suggests the mercenaries made a difference: Another group of 
Himeran warriors, buried 70 years later, all resembled each other in genetic and isotopic 
signatures—suggesting they fought on their own. They lost that later battle. 


Hurricane spurs storm research 


NATURAL DISASTERS | The destruction 
wrought by Hurricane Jan in Florida last 
week endangered some research projects 
while enabling others to gather new data 
about these big storms. An environmental 
science lab on Sanibel Island, run by the 
nonprofit Sanibel-Captiva Conservation 
Foundation, lost part of its roof and remains 
without power. Before the storm struck, 
University of Florida researchers deployed 
sensors around Punta Gorda Airport, near 
Fort Myers, to study how building codes can 
help structures survive high winds. Other 
researchers drove vehicles equipped with 
cameras in the area to gather images of 
damaged buildings, for a project on storms 
funded by the National Science Foundation. 


Alzheimer’s therapy scrutinized 


BIOMEDICINE | The pharmaceutical 
companies Biogen and Eisai last week 
announced that a monoclonal antibody 


treatment reduced cognitive decline by 
27% in people with early stage Alzheimer’s 
compared with those on a placebo after 

18 months. Lecanemab belongs to a class 

of therapies that break down or inhibit 
buildup of amyloid plaques in the brain, 
and is apparently the first to clearly subdue 
symptoms of the disease. But researchers 
also want to see more data from the clinical 
trial, which has so far been shared only 

by press release. Biogen and Eisai, which 
have applied for accelerated approval from 
the U.S. Food and Drug Administration, 

say they plan to release more information 
in November. One question is why lec- 
anemab seems to show promise when other 
therapies targeted at amyloid have failed to 
help patients. One theory is that the treat- 
ment targets “protofibrils,’ protein strands 
that haven’t yet consolidated into plaques. 
If lecanemab is approved, it may prove 
demanding for physicians to administer: The 
therapy is given by infusion, and patients 
may need periodic imaging to look for side 
effects, such as small brain hemorrhages. 
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Violent conflict in Myanmar 
linked to boom in amber studies 


Among hundreds of publications on fossils preserved in 
amber, almost none include Myanmar researchers 


By Rodrigo Pérez Ortega 


ver the past decade, growing num- 

bers of paleontologists have peered 

into the past through a unique win- 

dow: pieces of amber. These blobs 

of hardened tree resin preserve in 

exquisite detail insects, plants, tiny 
lizards, and bits of larger organisms, such 
as the feathered tail from a dinosaur. 

One of the world’s richest and oldest am- 
ber deposits, dating from late in the dino- 
saurs’ reign, is located in a country riven by 
political conflict: Myanmar. Now, a study sug- 
gests paleontological research has directly 
benefited from the conflict, which has cre- 
ated opportunities for ethically questionable 
mining, trade, and collecting practices. 

Myanmar’s discord precipitated a surge 
in specimens, and in the research they en- 
abled, the new paper claims. “This became 
the region of conflict, which is directly re- 
lated to amber,” says co-author Nussaibah 
Raja Schoob, a paleobiologist at the Friedrich 
Alexander University of Erlangen-Nuremberg 


10 7 OCTOBER 2022 + VOL 378 ISSUE 6615 


(FAU). And although outside researchers, 
mainly from China, have gleaned eye-catching 
findings from the specimens, Myanmar’s own 
small community of paleontologists has al- 
most never taken part, the study found. 

The study, which appeared last week in 
Communications Biology, “is thorough, well 
done, accurate, and needed,” says David 
Grimaldi, curator of the amber collection at 
the American Museum of Natural History in 
New York City, who says he stopped working 
on Myanmar amber in 2017. But other re- 
searchers see flaws in its analysis. 

Amber preserves fine details and soft tis- 
sue as if ancient organisms are frozen in time, 
and specimens from Myanmar are especially 
prized because they date to 99 million years 
ago in the Cretaceous period. “From all the 
deposits we know across the world, this is the 
one that preserves pieces of dinosaur habi- 
tat,” says study co-author Emma Dunne, a 
paleontologist at FAU. 

But most Myanmar amber mines are in the 
northern state of Kachin, where the Kachin 
Independence Army and the Myanmar gov- 


In a Bangkok gemstone market, a dealer 
examines an amber specimen from Myanmar. 


ernment military have been battling since 
the 1960s. Both sides have benefited from 
the amber trade, which is estimated at $1 bil- 
lion a year by the Kachin Development Net- 
working Group. In 2017, the mines fell under 
the control of the Myanmar military, which a 
United Nations Human Rights Council fact- 
finding mission found has committed geno- 
cide and crimes against humanity. 

Most amber from Kachin is transported to 
the Chinese border town of Tengchong, al- 
though some is also sold in Myanmar. In bus- 
tling markets, some pieces are sold as jewelry 
whereas those with fossils may go to private 
collectors or paleontologists. Rare pieces that 
preserve bits of vertebrates may sell for up to 
hundreds of thousands of dollars. 

Myanmar prohibited the permanent trans- 
port of fossil material out of the country in 
2015. But amber falls in a legal gray zone, as 
it is also considered a gemstone, which can 
be legally exported. 

After a Science exposé in 2019 explored 
those ethical complications (24 May 2019, 
p. 722) the Society of Vertebrate Paleonto- 
logy (SVP) released a letter calling for a mor- 
atorium on publishing studies of Myanmar 
amber obtained after 2017, when the mili- 
tary took over the mines. But few journals 
changed their policies. Last year, SVP called 
again for a moratorium, on papers based 
on Myanmar amber obtained after a 2021 
military coup, and released guidelines on re- 
search with amber acquired earlier. 

Dunne, Raja Schoob, and their co-authors 
tracked the amber publication record and 
found it has moved in tandem with politi- 
cal and other events. Searching the Web of 
Science, the team identified 937 publications 
on Myanmar amber and 55 “control” publica- 
tions that used nonamber fossils from Myan- 
mar, such as mammals and petrified wood. 
They then calculated publication trends for 
amber and nonamber papers. 

Before 2014, the number of amber publica- 
tions rose slowly and steadily. But that year, 
it started to explode, and has since grown ex- 
ponentially (see graphic, p. 11). The team also 
found a geographic shift. Before 2014, the 
United States dominated, with 69 Myanmar 
amber papers. After that year, China quickly 
rose to the top, with 417 papers. “China has 
been sort of a juggernaut for the study of 
Burmese amber,’ Grimaldi says. “They were 
monopolizing the commercial market.” 

Political, legal, and economic changes 
underlie the trends, the researchers argue. 
Around 2010, China tapped out its am- 
ber mines, whereas amber from neighbor- 
ing Myanmar was ever more accessible in 
the Tengchong markets, Raja Schoob says, 
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driving the boom in papers from China. “A 
country—China—with so many resources 
and so much access ... has driven so much 
interest,’ Dunne says. 

But other researchers say the boom sim- 
ply reflected a surge in academic interest. 
“There are several exaggerations and some 
mistakes” in the paper, says Shuo Wang, a 
paleontologist at the Qingdao University of 
Science and Technology. She argues that a 
2013 conference on fossil arthropods and am- 
ber opened the eyes of Chinese researchers 
to amber’s potential. And she notes that in 
2015 the journal Cretaceous Research put out 
a special issue on Myanmar amber, boosting 
the number of papers. Dunne agrees those 
factors could be contributing but says, “I 
highly doubt that scientific interest alone led 
to the massive increase in research output.” 

Wang agrees that for ethical reasons, 
paleontologists should restrict access to 
amber obtained after 2017. She says it’s not 
ideal to study specimens bought in markets, 
but Myanmar amber is otherwise nearly 
impossible to obtain. She and other groups 
have teamed up with researchers in Myan- 
mar to request permission to do fieldwork 
in the mines, but so far, the government has 
denied permits. 

She thinks scientists should be able to 
study materials obtained in the past, even 
if they lack documents establishing prov- 
enance. “It is impossible to provide purchase 
records and import and export certificates 
as required by SVP, because amber has been 
transferred many times before it entered the 
hands of scientists,’ she says. “If papers can- 


not be published due to ethical issues, it will 
be a huge loss. ... A lot of secrets [will] remain 
buried with the amber.” 

The new paper also found that only 
three of 872 amber publications included 
co-authors from Myanmar. “For Myanmar 
researchers, it’s very difficult to access the 
Myanmar amber mines,” says co-author Zin- 
Maung-Maung-Thein, a paleontologist at the 
University of Mandalay. But he thinks Myan- 
mar amber should continue to be studied, 
and suggests foreign researchers contact 
Myanmar embassies for proper paperwork 
and enlist Myanmar scientists. “It will prob- 
ably take time ... but it’s a win-win situation.” 

Although few journals heeded SVP’s origi- 
nal calls for a moratorium, some have quietly 
stopped accepting papers on Myanmar am- 
ber. And others have openly discussed the 
ethical and legal issues. “It’s time to maybe 
take a step up with policy,’ says Luiseach 
Nic Eoin, a senior editor at Nature Ecology & 
Evolution. Last year, Nature Portfolio also re- 
vised its policies to curb so-called parachute 
science, in which foreign scientists work with 
little local involvement. Science follows SVP’s 
guidelines and updated its policies in 2020 
to call attention to concerns about fossils col- 
lected from areas of political strife, including 
Myanmar amber. 

For Margaret Lewis, a vertebrate paleonto- 
logist at Stockton University and current vice 
president of SVP, the controversy over Myan- 
mar amber has galvanized efforts to rethink 
broader ethical standards. “This is one of the 
times,” she says, “where what we do has a 
fundamental impact on the world.” 


Why amber research boomed 


A bonanza of paleontological papers on fossils in amber from Myanmar 


appeared after Chinese amber mines were tapped out and as conflict 


continued in the country’s Kachin state. 
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NOBEL PRIZES 


Entanglement 
Snares prize 


Trio helped launch new 
quantum revolution 


By Adrian Cho 


his year’s Nobel Prize in Physics honors 

researchers who probed the nature of 

reality and launched the field of quan- 

tum information science. John Clauser 

of J.F. Clauser & Associates and Alain 

Aspect of the University of Paris-Saclay 
and the Polytechnic Institute of Paris used a 
phenomenon called entanglement—which 
Albert Einstein derided as “spooky action at 
a distance’—to prove quantum uncertainty 
cannot be explained away by some unseen 
physics. Anton Zeilinger of the University of 
Vienna showed entanglement can be used 
to, for example, “teleport” quantum informa- 
tion. “The recognition is overdue for these gi- 
ants,’ says Adrian Kent, a quantum physicist 
at the University of Cambridge. 

Two particles such as photons can be 
entangled so that even though the state of 
each one is uncertain, their two states are 
correlated. In 1964, British theorist John 
Bell realized entanglement could test the 
implication of quantum mechanics that a 
particle’s properties do not exist indepen- 
dently before they’re measured. Bell imag- 
ined two observers sharing pairs of photons 
entangled through their polarizations and 
comparing certain random measurements. 
He showed that if “hidden variables” pre- 
determine the results, the correlations be- 
tween the observers’ readings could only be 
so strong. If, as quantum mechanics pre- 
dicts, no such variables exist, the correla- 
tions could be stronger. 

In 1972, Clauser and a colleague performed 
a version of the experiment and observed the 
extra-strong correlations. In the 1980s, Aspect 
led much-refined experiments that ruled out 
various spurious correlations. A decade later, 
Zeilinger showed that entanglement can be 
swapped and can extend to distant particles, 
a step toward a quantum internet. 

This year’s Nobel underscores that “Bell 
should have been recognized,” says Jian-Wei 
Pan, a quantum physicist at the University of 
Science and Technology of China. “Unfortu- 
nately, quantum information science had still 
not yet emerged” when Bell died in 1990. 


With reporting by Jacklin Kwan and Dennis Normile. 
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Ancient DNA pioneer Svante Paabo wins Nobel 


By sequencing ancient hominins’ DNA, Paabo explored “what makes us uniquely human” 


By Andrew Curry 


he Nobel Prize in Physiology or Medi- 
cine was awarded this week to Swed- 
ish geneticist Svante Paabo, honoring 
work that illuminates both the dis- 
tant past and the genetic heritage of 
people living today. 

A director at the Max Planck Institute 
for Evolutionary Anthropology (EVA) in 
Leipzig, Germany, since 1997, Paabo pio- 
neered the now-booming field of ancient 
DNA research. He was the first to success- 
fully retrieve and sequence bits of ancient 
DNA from a Neanderthal in 1997. Then, 
after refining his methods 
to avoid contamination, his 
team sequenced a complete 
Neanderthal genome in 2009 
and a Denisovan, another ar- 
chaic human, the following 
year. His research has offered 
insights into the genetic evo- 
lution of modern humans, 
including a better under- 
standing of disease risks. 

The ancient genomes “al- 
low us to understand what 
makes humans humans,” 
says Johannes Krause, who 
did his Ph.D. in Paabo’s lab 
and now is also a director at 
EVA. Comparing modern and 
extinct human lineages has 
given scientists new insights 
into brain development, au- 
tism, and the immune sys- 
tem’s response to COVID-19 
and other diseases, he notes. “It will prob- 
ably take us a few more years to figure them 
all out,” Krause says. “But it will enable us to 
understand what makes us so special.” 

Paabo “will be the first to say it’s not just 
his work. It’s a team of people,” says paleo- 
anthropologist Chris Stringer of the Natural 
History Museum in London, a Neanderthal 
specialist. “But he’s built a great team.” 

Says evolutionary biologist Beth Shapiro 
of the University of California, Santa Cruz: 
“Svante’s insights ... inspired a generation 
of scientists and established paleogenomics 
as a rigorous field of research. ... [The field] 
has since allowed unexpected insights into 
human evolution, paleontology, ecology, 
and so many other disciplines.” 

When Paabo first heard the news, he 
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thought it was “an elaborate prank by my 
research group,” he told a press confer- 
ence last week. His interest in ancient DNA 
originated in a childhood fascination with 
ancient Egypt. In a 1985 Nature paper, he 
reported finding small amounts of DNA in 
the cells of Egyptian mummies. That early 
work was questioned because the smallest 
speck of human tissue can introduce mod- 
ern DNA. Paabo then pushed to develop 
techniques to minimize contamination dur- 
ing sampling and to differentiate ancient 
molecules from modern ones. 

“His Ph.D. work was on recovering DNA 
from mummies and it was pretty much a 


Svante Paabo holds a skull of one of our extinct cousins, the Neanderthals. 


failure. He got nothing,” Stringer says. “An- 
other kind of person might have said, ‘Well, 
Tm giving up, this is useless’ But he didn’t. 
He kept going.” 

In the late 1980s, Paabo’s work got a boost 
from the polymerase chain reaction tech- 
nology, which made it possible to replicate 
small fragments of DNA many times over. 
He reached out to museums in Germany for 
Neanderthal bone samples, grinding small 
amounts into powder and sequencing the 
DNA contained inside. 

Published in Cell in 1997, the results were 
a watershed moment, showing significant 
amounts of DNA could be recovered from 
bones 50,000 years old or more. The initial 
work relied on mitochondrial DNA, which 
is more plentiful in cells than DNA from the 


nucleus. The results were enough to show hu- 
mans and Neanderthals were two separate 
groups that diverged about 500,000 years 
ago. At first, this was interpreted as show- 
ing the groups had not interbred—a conclu- 
sion upended by Paabo’s later work. 

Fast-moving advances in sequencing 
technology, along with new samples, even- 
tually allowed his team to successfully 
sequence more than 4 billion base pairs. 
They published a draft Neanderthal ge- 
nome in Science in 2010. Comparing the 
Neanderthal and modern human genomes 
showed individuals in Europe and Asia 
today derive between 1% and 4% of their 
ancestry from Neanderthals. 
As anatomically modern hu- 
mans moved out of Africa 
beginning 100,000 or more 
years ago, they evidently in- 
terbred with Neanderthals. 

“Svante ... showed pretty 
convincingly that we had 
interbred with the Nean- 
derthals, and that DNA is 
still active in our genomes,” 
Stringer says. “So it does 
have medical importance.” 

In 2008, Paabo and his 
team recovered DNA from 
a finger bone fragment in a 
Siberian cave that revealed 
a previously unknown an- 
cient human population, now 
known as Denisovans. Here, 
too, the genetic results of- 
fered insights into modern 
human populations, reveal- 
ing that adaptations to living at high alti- 
tude found in modern Tibetan populations 
may be derived from Denisovan ancestors. 

The Nobel Prize in Physiology or Medi- 
cine isn’t often awarded to a single scientist. 
But, Krause says, “Who else would you give 
it to?” Many groups are using Paabo’s tools, 
he adds, “but they are largely his scientific 
progeny. ... Clearly there’s no one else who 
deserves it the way he does.” 

The award is “also a great honor for the 
field in general,” says paleoanthropologist 
Katerina Harvati of the University of Tiibin- 
gen. It underlines “how our past can affect 
our lives, our biology, and health today.” & 


With reporting by Gretchen Vogel, Kai Kupferschmidt, 
and Michael Price. 
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NASA asteroid test strikes a 
blow for planetary defense 


Observers study debris from DART’s crash into a space 
rock and wait to see how much the asteroid was deflected 


By Zack Savitsky, in Laurel, Maryland 


ast week, humanity took its first 
step toward bulletproofing Earth, 
as NASA’s unprecedented asteroid- 
deflection test scored a bull’s-eye. Now, 
as early measurements trickle in, the 
work of assessing the effects of the im- 
pact begins. Although the plume of ejecta 
from the spacecraft striking its target makes 
it hard to gauge precisely how much the blow 
disfigured the asteroid and shifted its orbit, 
scientists are picking up early clues. 

As NASA’s Double Asteroid Redirection 
Test (DART) probe closed in on its target, 
researchers at the Johns Hopkins Univer- 
sity Applied Physics Laboratory (APL) were 
glued to large screens. While DART sped by 
a 780-meter asteroid named Didymos and 
homed in on Dimorphos, its 160-meter-wide 
moon, they greeted each new image with 
heartier applause. Then, after one last close- 
up of the moon’s rubble-strewn surface, the 
screen flashed bright red as the craft lost sig- 
nal at the moment of impact. 

The scientists’ tear-stained cheeks lit up 
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with reflected light, and the crowd erupted. 
A bouquet of fireworks sprouted from behind 
the building, marking the first successful test 
of a strategy for planetary defense—which 
could one day help scientists deflect an as- 
teroid on a collision course with our planet. 

The DART spacecraft, roughly the size and 
mass of a cow, was launched in November 
2021 toward the asteroid system. Ten months 
later, only 1 hour out from impact, DART 
spied its final resting place, Dimorphos, and 
locked onto that target. Team members had 
prepared a series of last-second emergency 
maneuvers in case the spacecraft’s autono- 
mous navigation malfunctioned, but in the 
end, they didn’t have to touch a thing. As 
planned, DART crashed within 20 meters 
from the center of its target at 6 kilometers 
per second. 

“We are so excited to be done,” says Elena 
Adams, a DART mission systems engineer. “I 
can finally sleep.” 

Others, however, were just starting their 
work: making sense of the data gathered in 
the probe’s last moments and the hours fol- 
lowing the impact. From previous radar ob- 


Acamera on NASA's Double Asteroid Redirection Test 
(DART) spacecraft captured its rubble-strewn target, 
Dimorphos, moments before impact (left). Later, a 
CubeSat released by DART made an image showing 
the debris kicked up by the craft’s collision with 
Dimorphos (top); Didymos is in the foreground. From 
closer to Earth, the James Webb Space Telescope also 
recorded a close-up of the ejecta (bottom). 


servations, astronomers thought Didymos 
might be shaped like a spinning top, but the 
probe’s cameras showed it is more squished 
like a hamburger. Meanwhile, the smaller 
moon appeared surprisingly egg-shaped. The 
structure of these bodies holds clues to their 
origins and orbits, astronomers note. 

“Tm stumped,’ says Jessica Sunshine, a 
planetary scientist at the University of Mary- 
land, College Park, and DART investigator. 
“The primary, Didymos, did not look any- 
thing like we thought it was going to. ... How 
did that system form?—I mean, we've got to 
start over.” 

Multiple teams had trained their ground- 
based telescopes and orbiting observatories 
on the anticipated fireworks. Halfway across 
the globe from APL and late into their night, 
for example, two astronomers huddled in a 
lounge of the South African Astronomical 
Observatory. When DART’s transmission cut 
out, Amanda Sickafoose and Nicolas Erasmus 
examined footage they'd captured with the 
Lesedi optical telescope. They hoped to see 
confirmation of DART’s strike in the form of 
a gradual brightening of the asteroid system, 
as dust and rocks knocked off the asteroid 
reflected more sunlight toward the telescope. 
What they got was even clearer evidence of 
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success: Within seconds of DART’s last mes- 
sage, they watched in exquisite detail as the 
asteroid sneezed out a plume of ejecta. “We 
were astonished,” Sickafoose says. 

Through the night, scenes captured by ob- 
servers from all seven continents funneled 
into the DART team’s inbox. Others came 
from space: close-up views of the smack- 
down from the James Webb and Hubble 
space telescopes, as well as LICIACube, a 
small craft that DART ejected 2 weeks be- 
fore the crash. “The data is just incredible,” 
says Alan Fitzsimmons, an astronomer at 
Queen’s University Belfast and DART ob- 
server. “You couldn’t have asked for a better 
test of a kinetic impactor.” 

Rather than shooting off in a smooth 
cone, the debris formed condensed streams 
of material, LICIACube images revealed. 
“T was shocked to see ... how much of the 
ejecta was in these jets,” says Philip Metzger, 
a planetary scientist at the University of 
Central Florida. 

The cause of the jets is still a mystery, he 
adds, but understanding them will help sci- 
entists hone their asteroid deflection tech- 
niques. The irregular clumping of ejecta 
could also help explain how bodies early 
in Solar System history collided and aggre- 
gated to form planets. 

With the plume clearing, scientists 
around the world are studying changes in 
the light from the system as Dimorphos 
loops around its partner. In the coming 
weeks, they hope to decode the change in 
the asteroid’s orbit, which will indicate the 
effectiveness of DART’s strike. 

Yet this strategy for thwarting threaten- 
ing space rocks faces a major challenge: “We 
can’t use these techniques unless we know 
where the objects are,” says Amy Mainzer, 
an astronomer at the University of Arizona. 
“Tf you can’t find them, you certainly can’t 
deflect them.” 

Of the Dimorphos-size asteroids that 
could destroy a large city or small coun- 
try, astronomers estimate they’ve only 
found about 40%. NASA has plans to iden- 
tify and track 90% of these looming threats 
with NEO Surveyor, a space-based tele- 
scope project led by Mainzer. But in March, 
the agency proposed to slash more than 
$130 million from the mission’s planned 
budget of $170 million, delaying its comple- 
tion until at least 2028. NEO Surveyor’s fate 
now lies in the hands of Congress, which is 
hashing out the 2023 spending bill for NASA. 

Some hope DART’s bull’s-eye will help 
fuel the effort to identify lingering plan- 
etary threats. We’re no longer defenseless, 
Fitzsimmons contends. “DART has basically 
shown us that we are not like the dinosaurs. 
So, let’s find those asteroids ... and let’s do 
something about it.” 
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Uganda’s Ebola outbreak will put 
novel vaccines to the test 


As a less-studied ebolavirus kills dozens, researchers 
again scramble to launch trials during a crisis 


By Jon Cohen 


multipronged international effort 

has begun to pull out all the stops to 

launch trials of experimental Ebola 

vaccines in Uganda, which declared 

an outbreak of the deadly disease 

on 20 September. According to re- 
ports from Uganda’s health ministry and 
the World Health Organization (WHO), as 
of 2 October, the country had 43 confirmed 
and 19 probable cases, including 27 deaths. 
A trial of a vaccine candidate that’s furthest 
along in development could launch before 
the end of this month. 

Proven vaccines exist for Zaire ebola- 
virus, which has led to a dozen outbreaks 
in the neighboring Democratic Republic of 
the Congo (DRC) and was responsible for 
the massive Ebola epidemic in West Africa 
that exploded in 2014. But those vaccines 
cannot control this outbreak because it’s be- 
ing driven by a distant viral relative known 
as Sudan ebolavirus, which last caused an 
outbreak, also in Uganda, in 2012. The Zaire 
and Sudan ebolaviruses “are not variants 
and they’re not strains—they’re different vi- 
ruses,” says Nancy Sullivan, who heads bio- 
defense research at the National Institute of 
Allergy and Infectious Diseases (NIAID). 


As Uganda’s Ebola outbreak expands, medical personnel must wait 
for disinfected protective gear to dry. 


Researchers have long recognized that 
the world badly needs a Sudan ebolavirus 
vaccine: In 2016, Science published a sur- 
vey of 50 leading vaccine researchers who 
ranked the Sudan ebolavirus vaccine as the 
number one R&D priority based on feasibil- 
ity and need. But vaccinemakers have had 
little financial incentive to produce one. 

Three experimental Sudan ebolavirus 
vaccines have been evaluated for safety and 
immune responses in human studies, but 
because outbreaks are so rare, they have 
not had a real-world test. “We are moving 
really fast this time,” says WHO’s Ana Maria 
Henao-Restrepo, a vaccine specialist who is 
coordinating discussions between Uganda 
and stakeholders elsewhere. 

The furthest ahead is a candidate that 
the pharmaceutical giant GSK began to de- 
velop during the West African outbreak; the 
company donated the license for it to the 
nonprofit Sabin Vaccine Institute in 2019. 
The single-dose vaccine contains the gene 
for the surface protein of the virus stitched 
into a chimpanzee adenovirus (ChAd), 
which delivers the payload into cells. The 
U.S. Biomedical Advanced Research and 
Development Authority in 2019 awarded 
Sabin a $128 million contract to develop the 
product, and the candidate has protected 
monkeys challenged with the 
virus and appeared safe in 
small-scale human tests con- 
ducted by NIAID’s Vaccine 
Research Center. 

Henao-Restrepo says there 
is unanimous agreement that 
the Sabin candidate should be 
first in line for a Ugandan trial. 
Health officials there are now 
evaluating a draft proposal. 

NIAID’s Richard Koup, act- 
ing director of the Vaccine 
Research Center, says it has 
100 doses of the vaccine and 
has made them available to 
Uganda. Another 40,000 doses 
exist in bulk form that need 
to be put in vials. The Coali- 
tion for Epidemic Prepared- 
ness Innovations (CEPI), a 
nonprofit that supports R&D 
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for vaccines, has worked with Sabin to find 
a manufacturer, ReiThera, that can “fill and 
finish” some of the 40,000 doses. 

Nicole Lurie, CEPI’s U.S. director, says 
the current outbreak again shows how dif- 
ficult it is to manufacture and deploy ex- 
perimental vaccines that might help stop 
an outbreak. “This is a great case example 
of all of the gaps that need to be plugged,” 
Lurie says. “This outbreak could get really 
bad and it’s not clear whose responsibility 
it is to manufacture additional doses. The 
fact that we find ourselves in this situation 
now is nuts.” 

A second adenovirus-based vaccine can- 
didate developed by the University of Ox- 
ford has yet to prove itself in a monkey 
study. But Oxford has contracted with the 
Serum Institute of India to produce 20,000 
more doses in the next few months, Henao- 
Restrepo says. And the European Commis- 
sion in July 2020 approved a vaccine made 
by Johnson & Johnson that might protect 
against both ebolaviruses, but it requires 
two doses spaced over 56 days, a drawback 
when a virus is spreading fast. 

WHO’s proposed trial in Uganda, which 
only CEPI so far has offered to help fund, 
will adopt the same unusual strategy as a 
2015 study in Guinea that first proved the 
worth of a Zaire ebolavirus vaccine. The 
researchers, led by Henao-Restrepo, gave 
it to contacts of known cases in what’s 
known as a ring vaccination strategy. To 
sidestep ethical questions about with- 
holding a potential lifesaving medicine in 
a dire situation, the researchers did not 
compare the vaccine with a placebo shot, 
but instead gave some participants the vac- 
cine immediately, whereas others were in a 
“delayed” group. 

Ebola outbreaks historically have come 
to an end without vaccines: Surveillance, 
isolating infected people, strict hygiene ef- 
forts, and personal protective equipment 
for health care teams can all limit spread. 
But the DRC has used Zaire ebolavirus 
vaccines half a dozen times over the past 
4 years to speed the end of outbreaks. 
Henao-Restrepo says vaccines also make 
it easier to find contacts of cases “because 
you're offering something to the commu- 
nity,’ she says. In addition, mortality tends 
to drop for people who receive the vaccine 
soon after infection, and these people are 
also less likely to infect others. 

Sullivan, who will soon leave NIAID to 
become director of the National Emerging 
Infectious Diseases Laboratories at Boston 
University, frets that the world is once again 
finding itself underprepared to combat an 
outbreak of a long recognized viral threat. 
“All of the pandemic preparedness we're do- 
ing isn’t enough,” she says. 
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City workers removed fish killed last year 
by an algal bloom in Spain’s Mar Menor lagoon. 


CONSERVATION 


This lagoon is effectively a 
person, new Spanish law says 


In a first for Europe, country borrows concept from 
Indigenous science to protect endangered body of water 


By Erik Stokstad 


nly a few years ago, the clear, shallow 
waters of Mar Menor, a saltwater la- 
goon off eastern Spain that is Europe’s 
largest, hosted a robust population of 
the endangered fan mussel, a meter- 
long bivalve. But in 2016, a massive 
algal bloom, fueled by fertilizer washing off 
farm fields, sucked up the lagoon’s oxygen 
and killed 98% of the bivalves, along with 
seahorses, crabs, and other marine life. 

The suffocating blooms struck again and 
again, and millions of dead fish washed onto 
shore. By last year, local residents—some of 
whom benefit from tourism to the lagoon— 
had had enough. Led by a philosophy pro- 
fessor, activists launched a petition to adopt 
a new and radical legal strategy: granting 
the 135-square-kilometer lagoon the rights 
of personhood. Nearly 640,000 Spanish citi- 
zens signed it, and on 21 September, Spain’s 
Senate approved a bill enshrining the la- 
goon’s new rights. 

The new law doesn’t regard the lagoon 
and its watershed as fully human. But the 
ecosystem now has a legal right to exist, 
evolve naturally, and be restored. And like 
a person, it has legal guardians, including a 
scientific committee, giving its defenders a 
new voice. “I am very excited,” says Ignacio 
Bachmann-Fuentes, a senior lecturer in 
constitutional law at Pablo de Olavide Uni- 


versity. “This new law has very innovative 
and legally powerful elements.” 

The lagoon is the first ecosystem in Eu- 
rope to get such rights, but the approach has 
been gaining popularity around the world 
over the past decade. Bangladesh, for exam- 
ple, has granted personhood to all its rivers, 
including the Ganges; elsewhere, concepts in 
some Indigenous communities have helped 
drive the trend. “It’s taken off like wildfire,” 
says Catherine Iorns Magallanes, an environ- 
mental law expert at Victoria University of 
Wellington (VUW). 

The clearest success story, scholars say, is 
the Whanganui River in New Zealand, which 
was given legal rights by an act of Parliament 
in 2017. The river and its catchment can sue 
or be sued, enter contracts, and hold property. 
In that case, the aim was not to stop pollu- 
tion but to incorporate the Maori connection 
between people and nature into Western law. 
“The river is a physical and spiritual entity 
and its health and well-being are inextricably 
intertwined with the health and well-being of 
its people,’ says Gerrard Albert, a member of 
the Whanganui tribe who advises the tribal 
council governing the river. 

The legal move appears to be paying off 
for the environment, catalyzing a shift in 
New Zealand to emphasize the well-being of 
rivers over human needs, says Julia Talbot- 
Jones, an expert in environmental issues at 
VUW. Granting rights to the Whanganui 
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River has been “an integral steppingstone in 
this evolving transition,’ she says. 

In Mar Menor, where environmental con- 
cerns drove the personhood push, strong 
existing laws already protect species and 
habitats. People and institutions are not al- 
lowed to harm the fan mussel, for example, 
or pollute waters draining into the lagoon. 
But those laws have not been adequately en- 
forced, says Francisca Giménez-Casalduero, 
a marine ecologist at the University of Ali- 
cante. “I am so disappointed, so sad” about 
that failure and the lagoon’s state, she says. 

Spain’s environmental ministry recently 
started to act, committing nearly €500 mil- 
lion over the next 5 years to address pollu- 
tion in Mar Menor. This summer, workers 
removed large masses of algae from the 
lagoon to help prevent anoxia. Upstream, 
government agencies are destroying illegal 
irrigation canals to keep fertilizer out of the 
lagoon. Conservation advocates hope the new 
law will bolster these efforts. 

Now, any citizen can sue to protect Mar 
Menor, for example from too much fertilizer. 
The legal guardians, consisting of represen- 
tatives from government and citizens who 
have yearslong appointments, can suggest 
legal and other actions on the lagoon’s be- 
half. The scientific committee will gauge eco- 
logical health by establishing healthy ranges 
of salinity, oxygen, and other variables. It will 
also identify new threats and advise on res- 
toration measures. A monitoring commission 
will include representatives from environ- 
mental organizations, fishing and farming 
industries, and other stakeholders. Now, “We 
have another tool” for protection, Giménez- 
Casalduero says. “This opens a door for the 
control” of pollution and other problems af- 
flicting Mar Menor. 

The new law could spark a backlash. For 
example, farmers often resist cutting back on 
fertilizer, and the far-right Vox party called 
the initiative “legal nonsense,’ and has said it 
will appeal. Building broader support will be 
crucial. “Top-down declarations on their own 
can be empty,’ says Elizabeth Macpherson, 
who researches environmental law at the 
University of Canterbury. 

The new rights also need to be integrated 
into the current legal framework, which rec- 
ognizes the property rights of farmers. That 
step is vital, Talbot-Jones says. “If there is am- 
biguity, they'll end up embroiled in litigation.” 

Other nations will be watching. In the 
United Kingdom, groups are campaigning 
for the rights of rivers, largely in response 
to pollution. “It’s genuinely exciting to see a 
rights of nature campaign succeed in Spain, 
because it really does show the other Eu- 
ropean nations what is possible,’ says Erin 
O’Donnell, an expert in environmental law at 
the University of Melbourne Law School. 
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SCIENCE & SECURITY 


NSF turns to big data to check 
if grantees have foreign ties 


New CHIPS law mandates tougher enforcement 
of research security by U.S. funding agencies 


By Jeffrey Mervis 


he National Science Foundation 

(NSF) will soon begin to crunch sey- 

eral large databases to identify scien- 

tists who have failed to disclose ties 

to foreign institutions in their grant 

applications. It is arguably one of the 
boldest steps taken by a federal research 
agency to comply with a new law that aims 
to boost U.S. innovation—and prevent 
China and other nations from pilfering 
federally funded research. 

Passed in July after 2 years of debate, 
the CHIPS and Science Act appropriates 
$52 billion over 5 years to bolster the 
U.S. semiconductor industry—and prom- 
ises tens of billions more for fundamen- 
tal research in many fields. But the quid 
pro quo is tougher oversight. Lawmakers 
are particularly concerned about federal 
grantees who have joined a talent recruit- 
ment program funded by China that places 
restrictions on publication or that creates 
a “conflict of commitment” with their U.S.- 
funded institution. 

The new law bars scientists who receive 
federal grants from participating in talent 
programs funded by China, Russia, Iran, 
and North Korea, and prohibits scientists 
employed by the government from join- 
ing any nation’s talent program. (Several 
U.S. allies run similar programs.) It also 
requires agencies to assess the types of 
research most vulnerable to theft, provide 
more training to scientists on how to re- 
duce security risks, and gather more infor- 
mation from grantees. 

NSF is counting on big data to help safe- 
guard its $7 billion research portfolio. The 
agency already examines the biosketches 
included in each grant proposal, which 
provide basic information about appli- 
cants and key team members, including 
institutional affiliations, collaborations, 
research areas, and geographic locations. 
Now, NSF will compare what applicants 
have disclosed to the agency with in- 
formation in two databases of techni- 
cal publications—the Web of Science and 
Scopus—as well as U.S. patent applications. 

The goal is to spot omissions or incon- 


sistencies that could violate agency poli- 
cies, says Rebecca Keiser, head of NSF’s 
office of research security. One concern 
would be an NSF grantee who listed par- 
ticipation in a foreign talent program in a 
published paper or patent application but 
did not disclose that tie to NSF. 

“Very often the researcher will acknowl- 
edge one of these talent plans in their 
paper because it’s a requirement in their 
contract,” Keiser says. “Now, we’ll be able 
to find that through data analytics.” 

NSF will ask the scientist’s institution 
to explain any discrepancies in hopes of 
resolving them, she says. Those that can’t 
could be referred to the agency’s inspec- 
tor general, who would decide whether to 
launch a formal investigation. 

NSF unveiled its plan to create this new 
“system of records” in the fall of 2021. But 
it has yet to spell out exactly what informa- 
tion it will collect and how it will manage 
those data. That has prompted some anxi- 
ety among academic researchers. 

“We're still waiting to learn the rules of 
the road that apply to this new system of 
records,” says Kristin West of the Council 
on Governmental Relations, which tracks 
federal regulations for its 200-plus mem- 
ber institutions, which include research 
universities. Those institutions want to 
know who will have access to the data 
and how NSF will validate their accu- 
racy. They’d also like to be able to vet any 
discrepancies that NSF finds before the 
agency begins to ask questions. 

That won’t be possible, Keiser says, 
because the information it collects from 
grant applications is confidential. But in- 
stitutions can ask for NSF’s data-mining 
algorithms to do their own analyses, she 
says: “It’s a research tool, and we want 
everybody to have access to it.” 

Besides flagging disclosure discrepan- 
cies, that tool could also help scientists 
identify other groups doing similar re- 
search, Keiser says, opening the door to 
new collaborations. As Keiser imagines 
it, “We might call the university and say, 
‘Hey, the analytics found this really high- 
impact project you may not know about. 
Isn’t that awesome?’” 
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Coronavirus detective 
® Tulio de Oliveira outside 
his institute’s offices 
in Stellenbosch, South 
Africa, last month. 


AN ADVOCATE FOR AFRICA 


With rigorous science and good-humored braggadocio, Tulio de Oliveira 
champions coronavirus research from the Global South 


s Americans began to stir in the 
early morning hours of Thanks- 
giving Day 2021, a rapt interna- 
tional press corps was listening 
as a pony-tailed scientist in South 
Africa announced the identifica- 
tion of a worrisome new SARS- 
CoV-2 variant. Tulio de Oliveira, a 
Brazilian-born bioinformatician, 
explained that many of the variant’s doz- 
ens of mutations might make it more im- 
mune evasive and contagious—and that it 
was spreading “very fast” in South Africa. 
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By Meredith Wadman 


“We really would like to be wrong 
on some of the predictions,’ a sober de 
Oliveira told reporters. “The only thing 
that is any good ... is that we detected [it] 
very, very early ... [thanks to] the work of a 
truly big network.” 

The next day, the World Health Orga- 
nization (WHO) dubbed the new strain 
Omicron and classified it as a variant 


This story was supported by the Pulitzer Center. 


of concern. That same day, the United 
Kingdom and European Union banned 
travelers from South Africa, triggering 
1.5 million cancellations in the country’s 
$5.5 billion tourism industry in 48 hours. 
De Oliveira promptly took to Twitter, call- 
ing the bans “evil” and “stupid,” adding 
that the United Kingdom should pay South 
Africa economic compensation. 

De Oliveira, 46, of Stellenbosch Univer- 
sity, has shot to prominence by delivering 
both vital insights about the shape-shifting 
coronavirus—and some sharp words for the 
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his institute's offices in Stellenbosch, South Africa, aims to. | 
_ build Africa intoa scientific powerhouse by training Africans. ..9=— 


5 


Global North. Despite his ready smile and 
good humor in person, on Twitter and else- 
where he complains about countries that hog 
vaccines, unfairly impose travel bans, and 
practice what he calls colonial science. “Re- 
searchers in Africa have to produce at least 
twice as much to get less than half the respect 
of researchers from high-income countries,” 
he wrote recently in a stinging Lancet edito- 
rial. He also uses his platform to cheerfully 
promote the work of scientists from the 
Global South, including that of his own team. 
“The African Science Dream Team Strikes 
Again,’ he tweeted last month, boasting 
about a new Science paper, which included 
more than 300 African co-authors and used 
100,000 genomes to trace the evolution and 
spread of the pandemic coronavirus in Africa. 

De Oliveira “speaks out against scientific 
inequity, especially when it comes to Africa. 
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He’s an important voice for ... data credit— 
and not having this type of helicopter science 
where data comes from African countries 
and is published by European institutions,” 
says Emma Hodcroft, a molecular epidemio- 
logist at the University of Bern. (She spent a 
few weeks on a fellowship in de Oliveira’s lab 
in 2016.) 

De Oliveira’s provocative comments have 
angered some in the Global North, but his 
coronavirus sequencing work, which has 
repeatedly provided early warnings about 
menacing new strains, has won him high- 
profile fans. “The world owes Dr. Tulio de 
Oliveira a debt of gratitude for his genomic 
sequencing work ... and for sharing his re- 
search,” says WHO Director-General Tedros 
Adhanom Ghebreyesus. 

“Tf Tulio wasn’t a practicing scientist, the 
world would have been slower to know about 


and understand emerging variants,’ adds 
bioinformatician Trevor Bedford of the Fred 
Hutchinson Cancer Center. Time magazine 
put de Oliveira on its list of the 100 most 
influential people of this year and Nature 
named him as one of 10 researchers who 
shaped science in 2021. 

Some scientists note the irony that an 
olive-skinned Brazilian has become a leading 
voice for African science. But others see no 
problem. “He is identified with Africa, and I 
think we are identified with him, regardless 
of his color,’ says Sikhulile Moyo, lab director 
at the Botswana Harvard AIDS Institute Part- 
nership in Gaborone. Moyo’s lab was the first 
to sequence Omicron and he shared Time’s 
accolade with de Oliveira. “He’s raising a lot 
of Black African scientists.” 

De Oliveira is intent on keeping the con- 
versation going: “My message to the Global 
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North ... is that we are here,” he says. “If you 
realize that we have resources, experience, 


and knowledge ... we can help the whole 
world to avoid epidemics and pandemics. But 
we need to be supported and respected. We 
need to be given the chance to lead.” 


DE OLIVEIRA COMES BY his activism honestly. 
His mother, international development 
consultant Maria Joao Nazareth, was born 
in Mozambique but spent her young adult- 
hood in Brazil, where she was jailed several 
times—including when she was pregnant 
with Tulio—for protesting restrictions im- 
posed by the military dictatorship then in 
power. Once, as an architecture student 
in Brasilia, she organized a takeover of an 
anatomy lab to protest not being allowed to 
draw live nudes. (The students sketched ca- 
davers instead.) 
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De Oliveira’s father, Nei Simas Andrade de 
Oliveira, a transportation engineer, was no 
less a rebel, with books by Che Guevara in his 
massive library. His parents split when Tulio 
was 11. His father was “a very sweet man, that 
despite [the fact] he got married five times 
and had like, six children, no one could get 
cross with him, including all his ex-wives and 
children,’ de Oliveira recalls. His parents pro- 
duced a crop of activists: Tulio de Oliveira’s 
siblings include a human rights lawyer in 
East Timor and a manager for a global hu- 
man rights group. 

Joao recalls her son as a magnet for 
friends—a kid who once led a throng of fel- 
low skateboarders in a takeover of a major 
city street, shutting down traffic. He also 
started coding at age 11. 

Incollege in Brazil, he majored in molecular 
biology but also dove into scientific comput- 


ing. In 1997, when his mother returned to Af- 
rica, he went, too, finishing his undergraduate 
degree at the University of KwaZulu-Natal 
(UKZN), Durban. There, as an intern, he be- 
gan to produce genomic sequences of a virus 
then laying siege to South Africa: HIV. In the 
process, he says, “[It] became quite clear that 
my laboratory skills were not fantastic.” 

So he earned a Ph.D. at UKZN in bio- 
informatics: the art of using computers to 
analyze genomic sequence data. After grad- 
uating, he joined what he says was “the best 
viral evolution research group in the world” 
at the University of Oxford, working under 
Edward Holmes, now at the University of 
Sydney. He learned the real-world power 
of tracking viral evolution: When foreign 
medical workers were accused of deliber- 
ately infecting children in a Libyan hospital 
with HIV and hepatitis C, he and colleagues 
sequenced the viruses. The analysis showed 
the strains had been established in the hos- 
pital well before the medics arrived. The 
resulting Nature paper and a flurry of di- 
plomacy saved their lives. 

By 2009, de Oliveira was back in South 
Africa as director of what was then the Well- 
come Trust Africa Centre’s genomics pro- 
gram in the rural village of Somkhele. There, 
among people with some of the highest HIV 
burdens in the world, he and colleagues con- 
ducted a large, landmark sequencing study 
showing older men were infecting younger 
women, a revelation that influenced control 
strategies in South Africa and elsewhere. 

After a new director from the United 
Kingdom took over in 2013, de Oliveira 
chafed under what he felt was too much 
control of his work, which was separately 
supported by UKZN. In 2016, he moved 
with his South African wife and their young 
family to Durban, the site of UKZN’s medi- 
cal campus. The next year, with government 
funding, he launched the KwaZulu-Natal 
Research Innovation and Sequencing Plat- 
form (KRISP). He was determined it would 
analyze genomes as fast and as well as any 
leading global genomics institute. 

With Brazilian scientists, the center used 
viral sequences to track chikungunya and 
Zika outbreaks; it also documented growing 
HIV drug resistance. In its first 2 years, its 
scientists landed papers in Science, Nature, 
and dozens of other journals. 

As well as dissecting outbreaks, de Oliveira 
was at pains to make it easier and cheaper for 
others to do so. He had already published lab 
methods for HIV drug-resistance monitor- 
ing in low-income settings and developed a 
software tool that automated HIV subtyping, 
making it far easier for researchers to sort 
through masses of genomic data. At KRISP, 
he and his team cranked out similar tools 
with names like Genome Detective that 
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identified disease-causing viruses quickly 
and accurately from sequencing data with- 
out hours of painstaking work for the user. 

De Oliveira was also set on training Afri- 
cans and reversing the scientific brain drain 
that he saw drawing top young African tal- 
ent to train in rich-world laboratories— 
from which many didn’t return. To date, 
he has trained scientists from 39 African 
countries, he says. Over the past decade, he 
estimates he and Richard Lessells, an in- 
fectious disease physician at UKZN, have 
also trained “easily between 5000 to 10,000 
medical doctors and nurses” to use patho- 
gen sequencing to understand outbreaks 
and apply targeted treatment. 

“We have this very good relationship with 
most of the clinicians and nurses in the coun- 
try,’ de Oliveira says. When a worldwide pan- 
demic descended, those relationships turned 
out to be crucial. 


WITH HIS DEEP EXPERIENCE analyzing patho- 
gen evolution, de Oliveira understood from 
the first days of the pandemic that genomic 
surveillance of SARS-CoV-2 would be vital. In 
early April 2020, 1 month after South Africa’s 
first coronavirus case was identified, he and 
colleagues began to investigate a SARS-CoV-2 
outbreak at a Durban hospital. De Oliveira, 
Lessells, and Yunus Moosa, head of infectious 
diseases at UKZN, used gumshoe epidemio- 
logy and KRISP sequencers to find that a sin- 
gle patient had likely introduced the virus to 
the hospital, and that movement of patients 
between wards had probably hastened its 
spread; 15 patients died. 

The team streamlined lab processes and 
soon published protocols for speedily se- 
quencing the new virus. They also found 
that the viral genomes had a common, Eu- 
ropean ancestor and that the hospital’s first 
two cases were in people who had recently 
traveled to Europe. 

De Oliveira and Lessells began advising 
the physicians and nurses in their network 
on spacing beds and separating patients. 
Their 37-page report on the outbreak quickly 
became a manual for hospitals the world 
over on how to prevent the spread of SARS- 
CoV-2; in the days following its release, it 
was accessed more than 10,000 times in 
168 countries. 

The team was eager to track SARS-CoV-2 
evolution in all of South Africa. But it 
needed partners and funding. De Oliveira 
wooed South Africa’s National Institute for 
Communicable Diseases (NICD) by “show- 
ering them with kindness,” he says, sending 
scientists at the public health agency his 
new protocol along with $50,000 worth of 
needed reagents. 

By May 2020, de Oliveira had won gov- 
ernment funding, having corralled nearly a 
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dozen partners, including NICD, national 
public health labs, and university groups into 
a sprawling surveillance consortium. While 
other countries lagged—the U.S. govern- 
ment didn’t significantly fund SARS-CoV-2 
sequencing until early 2021—de Oliveira had 
already put his team in place. It vaulted the 
country to global prominence when SARS- 
CoV-2 began to shape-shift. 

“He set it in motion,’ says Anne von 
Gottberg, a clinical microbiologist at NICD, 
South Africa’s equivalent of the U.S. Centers 
for Disease Control and Prevention. “He got 
the first tranche of funding and he generously 
and magnanimously shared his expertise ... 
and shared reagents and standard operating 
procedures. [And the teamwork] really was 
what made [it] so tremendously powerful.” 

Still, von Gottberg and others wryly note 
that de Oliveira’s outsize media profile can 
overshadow the work of dozens of scientists. 
“He shouldn’t be the only spokesperson. 
There are many of us ... working very hard 
in the background,’ von Gottberg says. 

De Oliveira himself stresses the teamwork 
behind his success. “Scientists like fight- 
ing,” he says, laughing. “The whole secret is 
that we manage ... to put all the scientists 
together in the country. I have enormous pa- 
tience to make people work together.” 

In mid-October 2020, hospitals in the East- 
ern Cape province saw an unexpected up- 
surge in SARS-CoV-2 cases and shipped swab 
samples from patients overnight, on ice, to 
the KRISP team. They spotted what looked 
like a new variant and notified others in the 
network, who increased sampling in the re- 
gion. By December, the team confirmed the 
identity of what WHO later named the Beta 
variant. In Nature, de Oliveira’s network de- 
scribed the new mutations in the virus’ spike 
protein that, researchers later showed, made 
Beta deadlier than the original virus. 

Within days of the Beta announcement, 
the U.K. government, already awash in a 
highly contagious home-grown variant, Al- 
pha, identified two cases of Beta on U.K. 
shores. Then-U.K. health minister Matt 
Hancock called himself “incredibly grateful” 
to the South Africans for their transparency. 
Then he pronounced himself “incredibly 
worried” about Beta and banned travelers 
from South Africa—for 291 days. 

De Oliveira promptly took to Twitter. “One 
release[s] results & is punished?” he asked. 

It was a preview of what happened 
11 months later, when South Africa again 
warned the world, this time about Omi- 
cron. Scientists again praised the country 
for its openness, but countries including 
the United Kingdom reimposed travel bans, 
despite WHO’s opposition on economic 
grounds. De Oliveira and colleagues began 
to field death threats from South Africans 


infuriated that scientists had gone public 
about Omicron and tripped off another 
round of punishing restrictions. 

Today, Omicron has spread worldwide; the 
variant and its spinoffs—two of them, BA.4 
and BA.5, also first described by de Oliveira’s 
team—now account for an estimated 42% 
of the coronavirus cases sequenced globally 
since the pandemic began, as well as untold 
millions of deaths. But Omicron might have 
done still more damage if the South Africans 
hadn’t alerted the world so quickly, giving 
other countries time to prepare. 

That’s why travel bans are more than un- 
just, de Oliveira says. They “are a massive 
problem for global health,” he says, a disin- 
centive for scientists to publicize new vari- 
ants. “They may choose to keep quiet so they 
are not punished. And that could easily cause 
new epidemics and pandemics.” 


DE OLIVEIRA’S RHETORIC can have a bitter 
edge. This spring, he lashed out at Tom 
Wenseleers, an evolutionary biologist at 
KU Leuven, for putting his own name at 
the bottom of Twitter graphs created with 
data from the database GISAID and South 
Africa’s NICD. (When he tweeted the graphs, 
Wenseleers did credit both sources.) “Some 
global north scientists wait for ‘free data’ 
to analyze and even put their names on 


the graphs! Shame @TWenseleers,” de 
Oliveira tweeted. 
Minutes before de Oliveira’s tweet, 


Wenseleers apologized on Twitter, calling 
himself remiss in not more directly crediting 
“the incredible work” of de Oliveira and oth- 
ers in South Africa in generating sequence 
data; de Oliveira himself later apologized for 
his angry tweet. But Wenseleers also bristled 
at de Oliveira’s charge of scientific colonial- 
ism, and the pair continue to spar on Twitter. 

Similarly, when de Oliveira tweeted in 
July complaining that pandemic prevention 
institutes were being established “in Seattle, 
Washington, Berlin, Geneva and not in South 
Africa, Brazil, India, Indonesia,’ Dutch viro- 
logist Marion Koopmans of Erasmus Univer- 
sity Medical Center shot back: “The North 
versus South narrative does not help. ... We 
should collaborate, not polarize.” 

Deepti Gurdasani, a statistical geneti- 
cist at Queen Mary University of London, 
has used Twitter to flag excess deaths from 
Omicron subvariants in South Africa. That 
annoyed de Oliveira, who contends other 
respiratory viruses have fed into the excess 
deaths and that recent Omicron subvari- 
ants were less severe than the original. 
Gurdasani charges that de Oliveira “seems 
to ... troll and bully researchers” who coun- 
ter a narrative that Omicron is relatively 
benign. “That, to me, has been really, really 
disappointing,” Gurdasani says. 
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Tulio de Oliveira (left) works with Lucious Chabuka, a master’s 
student from the Public Health Institute of Malawi. 


“The tone of my Twitter is sometimes 
harsh,” de Oliveira concedes. “But the ... sci- 
entific discrimination, the economic damage 
from travel bans, the hoarding of vaccines 
that we suffer[ed] in South Africa during 
this pandemic was much, much harsher.” 

De Oliveira has used his ever-increasing 
visibility to advocate for the Global South 
on other issues. In June, he and genomi- 
cist Christian Happi of Redeemer’s Univer- 
sity in Nigeria organized a letter signed by 
29 scientists, most of them African, protest- 
ing a stigmatizing Africa-specific monkey- 
pox naming scheme and calling for a new 
nomenclature. Days later, WHO announced 
it would abandon the “Congo Basin” and 
“West Africa” monkeypox clade names. 

It helps that de Oliveira can bend ears at 
the highest levels. “Thank you Tulio. Good 
to hear from you,” Tedros texted to him dur- 
ing the monkeypox lobbying campaign. “I 
will get back to you ASAP.” 

“Tulio [is] becoming a voice for Africa,” 
says his senior grants manager, Suzette 
Grobler. She has worked with him since he 
was what she calls a rule-breaking “wild 
child” who once absconded to Brazil with 
an Africa Centre-owned laptop. (She brow- 
beat him into sending it back.) “Life has 
just afforded him the opportunity right now 
that people want to talk to him and want to 
hear what he has to say.” 

“He is the only scientist I think I know in 
Africa who is very vocal, who says the truth 
always,” says Lucious Chabuka, a laboratory 
technician at the Public Health Institute of 
Malawi and a master’s student and fellow 
at de Oliveira’s new institute, the Centre for 
Epidemic Response and Innovation (CERI). 

Chabuka and others say that when speak- 
ing to junior scientists, de Oliveira demon- 
strates the respectful attitude he demands 
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from others. “I was really impressed. ... 
He lets people contribute,’ says Fortunate 
Natwijuka, a laboratory technologist at 
the Uganda Virus Research Institute who 
attended a freewheeling session in “meet- 
ing room 1’—a table and lawn chairs under 
the trees outside the renovated winery that 
serves as CERI’s administrative headquar- 
ters. When a colleague introduced her to 
de Oliveira, she says she was surprised by 
his relative youth and said, “I didn’t realize 
this was your professor!” Tulio responded, 
“Don’t worry about the professor—I’m 
Tulio,” she recalls. 

De Oliveira, whose idea of dressing up for 
visitors consists of adding a rumpled jacket 
to his chinos and well-worn loafers, appears 
unconcerned with status. “We always chat. 
.. You can ask him anything,” says Onke 
Tsewu, the security guard at the building 
that houses CERI. “He is the sweetest, cool- 
est guy I have ever known.” 


EVEN BEFORE HIS NETWORK identified Omi- 
cron, de Oliveira’s sequencing prowess was 
drawing job offers at home and abroad. In 
July 2021, he accepted one from Stellen- 
bosch University to launch the new genom- 
ics center, CERI, which has labs in a newly 
refurbished $100 million university research 
building in Cape Town. CERI’s staff of 20 is 
expected to double in the next year; it also 
plays host to a steady stream of visiting fel- 
lows from other African countries. And it has 
powerful backers: In its first year it landed 
grants and donations worth $18 million from 
groups including the World Bank, the Rocke- 
feller Foundation, the European Commission, 
the Abbott Pandemic Defense Coalition, and 
the U.S. National Institutes of Health. 

At UKZN’s request, de Oliveira continues 
to run KRISP, making the 140-minute flight 


to Durban frequently. For now, he himself 
has no office at CERI, though he’ll have one 
when the building is complete early next 
year. “To sit in a big office and feel very self- 
important—I have no patience for that,’ he 
says. “Then you get away from the science.” 

Recently, de Oliveira gave a tour of CERI 
to a group of German politicians, part of a 
parade of dignitaries who regularly visit. 
He showed off a Goldilocks set of sequenc- 
ers crowned by two million-dollar Illumina 
machines and including three 11-kilogram 
Oxford Nanopore sequencers useful for train- 
ing visiting fellows who will use similar ma- 
chines back in their own countries. 

In a practiced stream of genial talk, de 
Oliveira noted the two biggest sequencers, 
donated by the Chan Soon-Shiong Family 
Foundation, give the institute the most se- 
quencing horsepower in Africa, making it a 
sentinel for future pandemics. One politician 
leaned in for a selfie with de Oliveira, who 
smiled obligingly and told the group, “I have 
received thousands of journalists, many from 
Germany—all the main TV channels. They 
come and say, ‘Oh, were you surprised that 
South Africa did so well on the scientific re- 
sponse to COVID?’ And I say—‘No!”” 

After the Germans left, he laughed at him- 
self. “I’m good at selling CERI. That’s impor- 
tant. That’s my job.” 

De Oliveira plans to merge CERI with 
KRISP soon, but his long-term ambitions 
are bigger. He envisions multiple campuses 
in several African countries, delivering hun- 
dreds of opportunities for budding local 
scientists and forcing the world to contend 
with Africa as a scientific power. “My dream 
is to show the world that the global South— 
Africa, Latin America, and Southeast Asia— 
are the best place in the world to identify 
new pathogens and control them.” & 
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Dear NextGen Voices, 


lam an undergraduate student trying to get 


exposure to research in a lab. | found an internship, o~ | 
but the graduate students and even postdocs 
seem to resent the idea of an undergraduate 


in the lab. Some have said that | take up valuable 


time and resources that should be devoted to 
more experienced lab members. | am often 


excluded from even informal conversations and 
activities. This unwelcoming culture took me by 


surprise and has decreased my motivation to 


pursue science. What should | do to improve my 
current situation, and how can | learn from this 
experience when making future career decisions? 


Sincerely, 
Unwelcome Undergrad 


NEXTGEN VOICES: ASK A PEER MENTOR 


When internships disappoint 


Being the most junior member of a lab is a rite of passage for many researchers. We asked young scientists 

to act as peer mentors by providing advice to an undergraduate student who was excited to find an internship 
but disappointed by the unwelcoming atmosphere in the lab. In response, mentors ask questions to help the 
student reflect, share their own experiences, and offer advice about how to move forward. Read a selection of 


their thoughts below. Follow NextGen Voices on Twitter with hashtag #NextGenSci. 


Make yourself useful 


If you were a doctoral student, what would 
you look for in an undergraduate student 
before letting them participate in your 
project? As a junior who has worked in two 
labs, I suggest carefully reading relevant 
research articles and offering to assist 
graduate students with their experiments 
(such as preparing experimental materials 
or processing data). If we do a good job 
with the tasks assigned to us by graduate 
students, they will be more likely to trust 
us and provide more opportunities. 


Rui Tang 

Division of Life Sciences and Medicine, University 
of Science and Technology of China, Hefei, Anhui 
230027, China. Email: zxcvobnm@mail.ustc.edu.cn 
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What are some tasks you can do to help 
others? A hardworking, sincere, and proac- 
tive attitude will take you far. Maybe you 
can make an agenda for a meeting, orga- 
nize files or data, or look up interesting 
data or papers for your colleagues. Shadow 
senior lab members, ask questions, and 
make detailed notes of their procedures so 
you can help them next time. Read exten- 
sively and challenge yourself to come up 
with one new question every day. 

Tina Bharani 

Department of Surgery, Harvard University, 


Brigham and Women's Hospital, Boston, MA 
02115, USA. Email: tbharani@bwh.harvard.edu 


Have you thought about getting your own 
research grant? Before starting my research, 


—Jennifer Sills 


I wrote a proposal and got funding from 
my university. I use my own funding to pur- 
chase experimental materials and reagents. 
Talk with your PI about the possibility— 
some universities have established research 
programs or funding for undergraduates. If 
you bring your own resources, you will have 
more in common with other lab members 
and they will see your contributions to lab 
development. In addition, proposal writing 
is great for your career development, given 
that grant application is the first step for 
your future independent research. 


Jian Ding 

Division of Life Sciences and Medicine, University 
of Science and Technology of China, Hefei, Anhui 
230027, China. Email: dj2019@mail.ustc.edu.cn 
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Is your lack of experience the only expla- 
nation for such poor treatment? Some 
labs are unwelcoming regardless—I had 

a similar experience as a postdoc. I would 
recommend trying to minimize the time 
you require of others. Clean up your 

lab bench when you finish working and 
notify others when materials need to be 
ordered before the supply is exhausted. If 
you find yourself stuck, spend some time 
searching the internet and primary litera- 
ture before asking someone. Seek advice 
from your fellow labmates or the PI if you 
still have questions. 

Kefeng Li 


Faculty of Applied Science, Macao Polytechnic 
University, Macao SR, China. Email: kli@ucsd.edu 


Are you familiar with your lab colleagues’ 
work? I felt awkward in a lab where every- 
one was busy with their own projects. Try 
to figure out what experienced lab mem- 
bers need. Study the papers they have 
published and ask them substantive ques- 
tions about their work, on email or social 
media if talking face-to-face is uncomfort- 
able. Or find a time to talk to them when 
they’re not busy, such as during lunch. 
Demonstrate that you have the potential 
to contribute at least to the discussion 

of their research. Once the connection is 
established, you can offer to participate in 
their current projects. 

Ju Wen 

School of Liberal Education, Chengdu Jincheng 


College, Chengdu, Sichuan 611731, China. 
Email: jupiter@cdjcc.edu.cn 


Communicate your concerns 


Have you identified the projects to which 
you can contribute? Well-established 

labs function like families, and finding 

a space in such a tight-knit community 

is challenging. Initiating a venture by 
yourself will be met with resistance. 
Instead, discuss with the PI how you can 
contribute, ideally by helping with several 
projects that use overlapping techniques. 
Then meet with each project lead and PI. 
If, despite your efforts, you continue to be 
ostracized, save your energy for a more 
friendly work culture. 

Suchitra D. Gopinath 

Translational Health Science and Technology 


Institute, Faridabad, Haryana 121001, India. 
Email: sgopinath@thsti-res.in 


Have you considered coordinating a meet- 
ing with the PI and your graduate mentor 
to discuss expectations and project goals? 
As an undergrad new to the lab, I actively 
communicated with my mentor and set 
clear expectations and quantitative objec- 
tives. Setting clear expectations between 
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mentor and mentee helps keep everyone 
on the same page regarding the responsi- 
bilities. Setting incremental, achievable, 
and quantitative quarterly goals helps 

to keep the responsibilities in check. 
Convince your mentor that you are an 
asset to them rather than a burden. 
Teng-Jui Lin 

Department of Chemical Engineering, University 


of Washington, Seattle, WA 98105, USA. 
Email: tlinlO@uw.edu 


Have you tried connecting with col- 
leagues on a personal level? Remember 
that the PI sees potential in you and 
decided to welcome you into the lab 
despite your relative inexperience. 
Consider asking lab members about how 
they first started—experienced research- 
ers sometimes forget that all researchers 
had to start somewhere and that they are 
no exception. More broadly, keep an open 
scientific mind and ask many questions. 
Initiative and curiosity are almost always 
appreciated and can earn you respect 
among researchers. By asking good 
questions, you can demonstrate that you 
bring something unique to the table even 
without years of experience. 

Jay X. J. Luo 


Johns Hopkins University, Baltimore, MD 21218, 
USA. Email: xluo28@jhu.edu 


Cultivate empathy 


Have you tried to empathize with the 
graduate students and postdocs? Although 
science is a noble and exciting pursuit, 

it is also hard and highly competitive. 

The members of your lab are likely under 
tremendous pressure to produce publi- 
cations, patents, or grants. They might 
worry that your presence will hinder their 
progress. If you approach the situation 
with empathy, you might find a way to 
help. By offering to care for lab rats or do 
simple, repetitive, and time-consuming 
procedures, you can develop a good rap- 
port with your lab colleagues and get 
hands-on experience. 

Qianjun Wen 

The Affiliated Hospital, Guizhou Medical 


University, Guiyang, Guizhou 550004, China. 
Email: wqjtmmu@126.com 


Why might the graduate students and 
postdocs in the lab resent the time and 
resources that you are using? When 
people feel like they don’t have a say in 
how they use their time, they can come 
to resent the people that they should sup- 
port. Maybe a postdoc planned to spend 
the next few weeks finalizing an impor- 
tant paper but was then told by the PI to 
help you instead. Meet with your PI and 
the other members of the lab to discuss 


expectations and find some middle 
ground. For example, maybe a postdoc 
could help you for a set time each day, 
leaving the rest of their day free. Clear 
expectations could reduce resentment. 
If, despite your efforts, the situation 
doesn’t improve, see if other support is 
available, and don’t be afraid to leave if 
the costs to your wellbeing are greater 
than the benefits of the internship. 
Finally, please don’t give up on science 
because of this experience. There are lots 
of different labs, and many are very sup- 
portive and friendly! 

Katherine Davis 

Department of Infectious Disease Epidemiology, 


Imperial College London, London W2 1PG, UK. 
Twitter: @kd_katdavis 


Have you considered why this unwelcom- 
ing culture exists in the lab? Teaching is 
an essential part of academia, and a suc- 
cessful teaching culture starts with the 
PI. Does the PI in this lab think teaching 
is a waste of time? Is it that lab members 
do not receive recognition for their time 
and effort spent teaching? By talking to 
people from other labs, you can identify 
what is different in this particular lab. 
Understanding the lab dynamics will 
help you decide how to move forward. 
Norman van Rhijn 

Manchester Fungal Infection Group, University 


of Manchester, Manchester, UK. 
Twitter: @Normankhijn 


Demand respect 


Do you value interesting research more 
than a welcoming environment? As a 
female undergraduate, I was devastated 
when I received far harsher treatment 
from my lab supervisor than did my 

male counterparts. I felt like there was 

no action I could take to protect myself 
other than to take up less space. But then 
I realized that no one could fully exclude 
me without my consent. Remember that 
your presence in the lab, even as an under- 
graduate, is incredibly valuable! It is worth 
remembering that every one of those 
graduate students and postdocs were once 
undergrads who needed training, men- 
tors, and patience. Fighting for the space 
you deserve to take up is a valuable 
exercise and has the potential to change 
your colleagues’ minds. Ask questions, 
demand answers, and do your best to 
remain optimistic. I speak from experi- 
ence when I say there are wonderful 
labs that welcome undergrads and make 
you a part of their family. 


Name withheld 
Philadelphia, PA, USA. 
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Where else can you get the skills, knowl- 
edge, and experiences you were hoping 
to gain from this internship? You deserve 
to be welcomed and respected. As the 
new person, it is not your responsibility 
to fix the lab culture; that falls on the 
PI. Your responsibility is to take care of 
yourself and your integrity. Leave this 
lab, and keep looking until you find a 
place where you are wanted and valued. 
You cannot do your best work when 
disrespected, and the value of your 
work will be discounted because the 
right people won’t see it. If you have the 
confidence, you can tell the hiring man- 
ager or PI why you are leaving, but you 
don’t owe them that. You owe yourself a 
healthy, constructive, productive work 
environment. 

Sarah M. Anderson 


Harrisburg, PA 17102, USA. 
Email: sarah.m.anderson.1O@gmail.com 


What vision did you have for this intern- 
ship when you began? Talk to your PI 

or supervisor about your dissatisfaction 
with being excluded from lab activities. 
Remind them that the more closely you 
are involved, the more productive you 
can be as a member of the lab. If your 
PI suggests that you avoid concerning 
yourself with—for example—the broader 
motivations behind the research to 
which you are supposed to be contrib- 
uting, understand that this is not how 
research experiences are supposed to be. 
The reality is that research is far more 
gratifying and exciting when you find 

a lab that is willing to invest in your 
success. Moving forward, use your poor 
experience as a reference point: Search 
for labs that will immerse you in the sci- 
ence and see your value beyond carrying 
out menial tasks. 

Rishi Jai Patel 

Department of Chemistry, University of 


Pennsylvania, Philadelphia, PA 19104, USA. 
Email: ripatel@sas.upenn.edu 


Find support 


Have other undergraduates had simi- 

lar experiences in this lab? If you feel 
unwelcome in the lab, chances are others 
have, too. I suggest that you connect 
with your lab’s alumni to learn how 

they overcame the resentment toward 
undergraduates. Lab alumni will likely 
be sympathetic to your situation and 
may provide helpful tidbits about how 

to connect with your colleagues. Perhaps 
you will find that you and a postdoc are 
from the same hometown or have similar 
hobbies. Such information can help you 
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identify potential mentors in the lab 
and initiate informal conversations with 
them. Simply bonding with one of your 
postdocs may be enough for you to carve 
out an academic role in your lab and 
establish a good rapport with the rest of 
your research team. Learning from your 
predecessors can give you an edge in 
building those relationships. 

Sai Sarnala 

Department of Chemistry, University of 


Pennsylvania, Philadelphia, PA 19104, USA. 
Email: ssarnala@sas.upenn.edu 


Are there any support networks in place? 
In my experience, as a member of minor- 
ity groups, scientific communities are not 
always inclusive of individuals with needs 
outside the norm. Practice self-care first. 
Then, try to find people who may be in 

a similar position or who can listen and 
understand your concerns. Having a sup- 
port network (for emotional and academic 
support) can help you find the strength to 
manage issues and advocate for yourself. 
Fernanda Suemi Oda 


University of Kansas, Lawrence, KS 66045, USA. 
Email: oda@ku.edu 


Take the long view 


Have you considered that other fields 
have unwelcoming spaces as well? I 
would discourage you from changing 
your career goals based on this experi- 
ence. In any field—not just science—new 
trainees are excited by the possibilities 
of a career, whereas those with more 
experience tend to be more cynical about 
the field’s limitations. Not everyone 
understands how to nurture newcomers 
and keep their hope alive. Focus on your 
research question and hone the skills 
you need. You will encounter similar 
challenges in every field, so learning to 
overcome them will help you no matter 
what you decide to pursue. 

Garima Singh 


Mott MacDonald, Noida, UP 201010, India. 
Email: Singhg20@gmail.com 


How can you use this research experi- 
ence to make yourself a more sought-out 
candidate in other labs? Create a presen- 
tation summarizing what you learned to 
build your resume, and then apply for 
internships in diverse fields. Never make 
your future career decisions based on 
one experience; instead, use scientific 
principles (i.e., at least three internships 
or research experiences) to inform your 
career choices. 


Naga Rama Kothapalli 
Trudeau Institute, Saranac Lake, NY 12983, USA. 
Twitter: @rama_k_n 


Do you think all the lab members feel 
this way about you, or are they follow- 
ing the lead of more senior members? It 
may help to try to connect with just one 
person in the lab, maybe someone who 
shares your interests and has not been 
hostile to you. Unfortunately, some labs 
prioritize competition over collaboration. 
In the future when looking for work, talk 
with people that work in the lab before 
interviewing or speaking with the super- 
visor. Use what you’ve learned to identify 
labs with better social cultures. 

Natalie Scott 

US Department of Agriculture, Parlier, CA 

93648, USA. Email: natalie.scott@usda.gov 


Are you able to achieve your goals in 
this work environment? I have also 
encountered lab members that don’t 
seem to respect what I’m doing. In 

my eyes, this is good training for how 
reviewers will treat you when you sub- 
mit papers. Reviewers often approach 
your work with substantial skepticism. 
Practicing being confident in your work 
and learning how to be convincing, 
especially to those that are skeptical, 
will help greatly in the long run. Also, 
next time you look for a lab position, 
explore the website’s gallery tab. On 
occasion, you’ll find a gallery inundated 
with images from lab dinners, farewell 
parties, or thesis defense celebrations— 
all with members smiling together. 
Pictures of smiling people don’t guaran- 
tee a great lab experience, but they’re a 
good place to start. 

Jackson Ross Powell 

Vagelos Molecular Life Sciences Program, 


University of Pennsylvania, Philadelphia, PA 
19104, USA. Email: jrp24@sas.upenn.edu 


Have you considered how you can find 

a better fit next time? In my experience, 
the cultural norms across departments 
and labs vary tremendously. Next time 
you have an opportunity to apply to labs 
and programs, carefully study the profiles 
of the academic staff, research associ- 
ates, and students. Look for a diversity 
of academic backgrounds. Do the team 
members come from the same or similar 
institutions? Do the team members have 
similar niche research areas? I have 
found that departments that are either 
multidisciplinary or have researchers 
with a breadth of interests and methods 
are rewarding and inclusive. 

Samuel Nathan Kirshner 

School of Information Systems and Technology 
Management, University of New South Wales, 


Sydney, NSW 2052, Australia. 
Email: s.kirshner@unsw.edu.au 
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DORMANCY 


An electric alarm clock 


for spores 


Inactive spores integrate stimuli over time through 
stored electrochemical potential 


By Jonathan Lombardino'? and 
Briana M. Burton? 


oO survive among complex and tran- 
sient environmental conditions, bio- 
logical systems have evolved sensitive 
mechanisms to not only integrate 
informative stimuli but to mount a 
decisive response. Sensing the onset 
of harsh conditions, spore-forming organ- 
isms, including bacteria and fungi, undergo 
energetically costly developmental repro- 
gramming to enter dormant states that 
persist through a multitude of biological 
insults including nutrient deprivation, ex- 
treme heat, and desiccation. Notably, spores 
formed by bacteria such as Bacillus subtilis 
can remain in their dormant, metabolically 
inactive state for extended periods of time, 
potentially even hundreds of years. How 
can a dormant entity respond in a dynamic 
way to its environment and thus commit to 
resume biological activity and growth at an 
appropriate time? On page 43 of this issue, 
Kikuchi et al. (1) reveal that exit of B. subti- 
lis spores from dormancy may be explained 
by electrochemical-state switching, similar 
to that used by neurons. 
Entry into sporulation and the subse- 
quent exit from dormancy, known as germi- 
nation, are both irreversible developmental 


'Microbiology Doctoral Training Program, 

University of Wisconsin-Madison, WI, USA. Bacteriology 
Department, University of Wisconsin-Madison, WI, USA. 
Email: briana.burton@wisc.edu 
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decisions. However, unlike sporulation, the 
initiation of germination must occur in the 
absence of translation, gene expression re- 
programming, and adenosine triphosphate 
(ATP) synthesis. B. subtilis spores contain 
high concentrations of Ca?*-dipicolinic acid 
(CaDPA), which contributes to extremely low 
water content and metabolic activity, as well 
as stable storage of nucleic acid. Spore DNA 
itself is also surrounded by multilayer pro- 
teinaceous and peptidoglycan surfaces that 
severely restrict access of molecules into and 
out of the spore. Thus, germination deviates 
from traditional mechanisms that underpin 
signal transduction pathways (2). 

In spite of these deviations, the return 
to metabolic activity requires relatively 
few macromolecules. Germination recep- 
tors in the inner membrane respond to 
small-molecule cues, or germinants, in- 
cluding specific D-sugars, L-amino acids, 
or peptidoglycan fragments (3, 4). Next, 
these germination receptors are thought to 
initiate movement of monovalent cations 
across the inner membrane, and subse- 
quently the release of CaDPA outside the 
spore. Thereafter, CaDPA is replaced by 
water molecules to begin rehydration, a 
necessary step in the resumption of meta- 
bolic activity and biosynthesis. However, it 
remains unknown how the movement of 
monovalent cations activates the release 
of CaDPA. 

Just as spores are recalcitrant to harsh 
environments, so too has been the discov- 


False-color transmission electron micrograph shows 
Bacillus subtilis in the final stages of spore formation. 
The spore has many protective membranes 

(red and green) that restrict access of molecules. 


ery of the molecular mechanisms govern- 
ing the initiation of germination. In 1981, 
it was proposed that spore germination 
could be triggered by a germination recep- 
tor responding to a cognate germinant (5). 
In individual cells, commitment to germi- 
nation increases over repeated exposures 
to germinant signals (6). These observa- 
tions led to the idea that spores could re- 
tain a “memory” of prior exposure to ger- 
minant (7). It was proposed that a physical 
change in spore proteins could record this 
past exposure. The nature of this change, 
however, remained elusive. 

Additional examination of germination 
initiation exposed an element of stochas- 
ticity in the process, which appeared inde- 
pendent of prior exposure to germinant (8). 
This led to the description of spontaneous 
germination (9), a process that depends on 
a late sporulation transcription factor that 
influences the expression of spore coat 
genes. How modulation of the outer layers 
protecting the spore ultimately feeds into 
the switch from dormancy to germination 
remained unclear. Thus, although memory 
of exposure to germinant and spontane- 
ous germination described the global 
responses, the conceptual and molecu- 
lar mechanisms for this signaling switch 
lacked a physical explanation. 

Building on foundational principles in 
neurobiology, Kikuchi et al. investigate the 
hypothesis that electrochemical potential 
in the absence of cellular energy allows 
the spores to integrate germinal signals. 
Through the application of a mathematical 
framework to understand the electrochem- 
ical contributions of potassium-ion (K*) 
concentration differences across the dor- 
mant spore membrane, the authors demon- 
strate stepwise changes in electrochemical 
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potential when spores are repeatedly and 
transiently exposed to a germination stimu- 
lus. Upon reaching the threshold potential, 
germination begins (see the figure). This 
mechanism of activation is reminiscent of 
“integrate-and-fire” activation models previ- 
ously applied to neurons. 

Integrate-and-fire describes a model of 
the action potentials of neurons, whereby 
small preliminary inputs are integrated un- 
til they reach a threshold that is required to 
“fire” an action potential in an all-or-nothing 
manner (J0). This membrane potential can 
be modeled through the Hodgkin-Huxley 
framework, which derives the electrical con- 
ductance as the summation of the individual 
contributions of voltage-gated sodium- and 
potassium-ion channels in a neuron (11). 
Given the universal ability of a membrane 


full metabolic activity in germinating spores 
or to the variability of germination across 
bacterial populations. Indeed, the potential 
for inherent variability of initial K* concen- 
trations and the rate of K* efflux in the model 
presented by Kikuchi et al. may be the mech- 
anism that buffers erroneous germination de- 
cisions amid fluctuating nutrient abundance. 
For example, could the extreme longevity of 
so-called superdormant spores be traced to 
their initial K* concentrations? 

Future work examining the germina- 
tion behavior of spores originating from 
divergent taxa may provide useful insights 
into how environmental conditions dictate 
thresholds for germination. Rapid signaling 
through electrochemical potentials provides 
reliable solutions when biological responses 
involving macromolecular dynamics are not 


Germination decision-making 


Bacillus subtilis spores exposed to germination stimuli over time activate germinant receptors, which leads to 
efflux of potassium ions (K*) through a passive channel. Once the inner membrane potential reaches a critical 
threshold, the spore state switches, committing to germination and outgrowth. 


Inner membrane 


Exposure 
to nutrient 
stimulus 


Bacillus 
subtilis 
spore 


No germination 


to store electrical energy as a biological ca- 
pacitor, Kikuchi et al. were able to develop a 
model from the Hodgkin-Huxley framework 
for K* flux derived from the electrical conduc- 
tance contributions of specific and nonspe- 
cific potassium channels in spores. 

Kikuchi et al. provide evidence that an “in- 
tegrate-and-fire” model underpins the com- 
mitment to exit from dormancy. The initial 
K* concentration in the dormant spore and 
the efflux resulting from exposure to stimuli 
determine the path out of dormancy. This de- 
scription provides a mechanistic explanation 
for the phenomena of germinant memory 
and spontaneous germination. However, it 
is still unknown how various germination 
receptors transduce the binding of germi- 
nant to activate potassium efflux channels. 
Similarly, it remains unclear how alterations 
in the electrochemical potential of the inner 
membrane contribute to the resumption of 
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feasible, such as for spores with negligible 
metabolic activity. Just as cross-kingdom 
models brought insight to the activation step 
in germination, answers to the remaining 
questions about germination may benefit 
from models derived across the tree of life. 
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The endless 
search for 
better alloys 


Machine learning narrows 
down the enormous 
search space for functional 
materials 


By Qing-Miao Hu! and Rui Yang?” 


esearchers and engineers are con- 

stantly searching for materials with 

specific properties to drive the rapid 

development of various technolo- 

gies. Because of the practically in- 

finite combinations of materials, 
these searches need to be strategized. In 
the case of conventional alloys, they gen- 
erally consist of a single principal metal 
element accompanied by other elements. 
More recently, researchers have ventured 
into looking for alloys with multiple prin- 
cipal elements (J, 2). This type of alloy, 
called a high-entropy alloy (HEA), greatly 
expands the search space of alloys for ma- 
terials design. On page 78 of this issue, Rao 
et al. (3) present a physics-informed ma- 
chine-learning approach to screen alloys 
with low thermal expansion coefficient 
within the huge iron-cobalt-nickel-chro- 
mium (Fe-Co-Ni-Cr) and_ iron-cobalt- 
nickel-chromium-copper (Fe-Co-Ni-Cr- 
Cu) composition space. These materials, 
which expand and contract very little with 
temperature changes, make them valuable 
for application on precision instruments 
for which high dimensional stability of the 
components is required. 

By training an artificial neural net- 
work using existing data on alloys, the 
network can learn and predict the com- 
plex relationship between the elements 
and their collective properties in an alloy. 
Depending on how the artificial neural 
network is programmed and what kind of 
training data are used, the network can, 
for example, predict the thermal expansion 
coefficient when given the alloy’s composi- 
tion. In lieu of a complete understanding 
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of the complex science that determines a 
material’s property based on its composi- 
tion, material scientists often rely on ex- 
periences rather than fundamental physics 
when searching for materials. An artificial 
neural network can effectively assist a sci- 
entist’s intuition to predict a material’s 
properties on the basis of its composition 
and reduce the time needed to synthesize 
and test material. However, the quality of 
the machine-learning approach is limited 
by the available data. This is especially a 
problem for the large search space and 
little existing data on HEAs. 

Rao et al. designed a methodology for 
training a machine-learning algorithm 
and used it to search for “Invar” materi- 
als among HEAs (see the figure). Invar, 
which comes from the word “invariable,” 
are materials with a very low coefficient 
of thermal expansion. The authors first 
trained the learning algorithm using pub- 
lished data of ~'700 alloys and then tasked 
the algorithm to generate a large number 
of candidate compositions with low ther- 
mal coefficients. The top 1000 candidates 
among the generated compositions were 
selected by using another algorithm that 
takes into account atomic features such as 
the valence electron concentrations and 
atomic radius of the elements. The selected 
1000 compositions were further narrowed 
down to 20 to 30 by using another learn- 
ing algorithm with additional inputs about 
physical properties from density functional 
theory calculations and thermodynamic 
databases. Last, the top three candidates 


were physically created in the laboratory, 


then their properties were measured and 
fed back to the experimental dataset. In 
doing so, the incorporation of the physical 
properties as well as the feedback of exper- 
imental data improve prediction accuracy 
and efficiency. By repeating this procedure 
six times, Rao et al. discovered 17 Invar 
HEAs out of millions of possible composi- 
tions (with each of the elements ranging 
from 0 to 100% with 1% resolution), and 
more Invar HEAs could be identified with 
subsequent cycles. 

The thermal expansion coefficients of 
the 17 Invar alloys discovered by Rao et 
al. are all smaller than the current known 
record (~10° K") of HEAs. Among the dis- 
covered 17 Invar HEAs, the lowest thermal 
expansion coefficient is ~2 x 10°° K+, com- 
parable with that of the conventional alloy 
Fe,.Ni,.. The advantage of the Invar HEA is 
that the huge composition space of HEAs 
makes it possible to find Invar alloys with 
some other superior properties, such as 
high strength and good ductility. 

The coefficient of thermal expansion— 
the target property in the study of Rao 
et al.—has a relatively straightforward 
relationship with the alloy composition. 
Mechanical properties, such as strength 
and ductility, may be more difficult to pre- 
dict with this method because they depend 
more on the microstructure of the alloy. 
The microstructure is determined by both 
the composition and processing factors 
during the fabrication of the alloy—for 
example, the temperature and duration of 
heat treatment, and the rolling or forging 
of the material. Researchers have tried to 


Cutting out the guesswork of alloy recipes 


Rao et al. developed a machine-learning algorithm that can be repeatedly trained to search for high-entropy alloys with low 
thermal expansion coefficient when given their compositions. Future algorithms may consider additional input parameters, 


such as those involving the alloy’s synthesis and processing methods and its microstructure. 
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find connections between the composition 
and microstructure of steels by using ma- 
chine learning, but the influence of pro- 
cessing on the microstructure has not been 
considered. Moreover, the microstructure- 
to-property relationship also remains a 
missing link for building a more effective 
machine-learning search engine for alloys 
with target mechanical properties (4). 

Because of the complex interactions 
among the many parameters, the learn- 
ing algorithms will need to incorporate 
additional details to be more versatile 
and effective in their searches, which will 
make the algorithm design challenging. 
Furthermore, for most applications, there 
is also the practical need to find a material 
that can satisfy multiple property require- 
ments, such as a good conductor that is 
also flexible. This will also increase com- 
plexity and complicate algorithm design. 
Despite these challenges, researchers have 
found success in using machine learning 
to search for a wide range of exotic mate- 
rials, such as complex doped perovskites 
with high ferroelectric Curie temperature 
(5) and lead-free piezoelectrics with large 
electrostrains (6). It has also been used to 
predict the elastic modulus of multicompo- 
nent Fe-Cr-based alloys (7). 

Direct composition-to-property predic- 
tion for materials design remains a chal- 
lenge for material scientists. With the 
accumulation of experimental datasets, 
development of the optimization models 
for artificial neural networks, and a better 
understanding of the physics underlying 
the relationships among composition pro- 
cessing, microstructure, and 
property, a universal virtual 
laboratory may, one day, be- 
come reality. | 
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NEURODEGENERATION 


A perturbed network in neurodegeneration 


Three proteins act in concert to cause neuronal pathology 


By Jean-Marc Gallo! and Dieter Edbauer?** 


myotrophic lateral sclerosis (ALS) 
and frontotemporal dementia (FTD) 
are neurodegenerative diseases that 
share clinical, pathological, and 
genetic features. A pathological 
hallmark of ALS-FTD is the pres- 
ence of ubiquitin-positive protein aggre- 
gates (or inclusions) in the cytoplasm of 
affected neurons. The main component 
of inclusions in most ALS cases and half 
of FTD cases is the mostly nuclear RNA 
binding protein, TDP-43 (TAR DNA bind- 
ing protein 43) (J). The existence of rare 
ALS-causing mutations in TARDBP, which 
encodes TDP-43, suggests a causal role of 
TDP-43 dysfunction in the pathogenesis 
of ALS-FTD (2, 3). Pathogenic mutations 
in several other genes are more common 
in ALS and FTD, but how they trigger cy- 
toplasmic aggregation of TDP-43 re- 
mains unknown. On page 94 of this 
issue, Shao et al. (4) partly address 
this by showing that two ALS-FTD- 
associated genes cooperate to cause 
TDP-43 cytoplasmic aggregation by 
impairing endosome maturation. 

The most common cause of ALS- 
FTD is a long G,C, hexanucleotide re- 
peat expansion in the C9ORF72 gene. 
The number of G,C, repeats increases 
from 2 to 20 in unaffected individuals 


to hundreds or thousands in patients. Poly(GA) O ALS-FTD proteins in 
C9ORF72 G,C, RNA repeats are trans- recruits the endolysosomal 
lated by a noncanonical mechanism TBK1 sai (®) pathway 


into dipeptide repeat (DPR) proteins. 
DPRs are toxic in cell and animal 
models and form abundant neuro- 
nal inclusions in patients. Loss-of- 
function mutations in TBKI (TANK- 
binding kinase 1) can also trigger the 
full ALS-FTD spectrum with TDP-43 
pathology (5). TBK1 is a central node 
in many cellular pathways, including 
innate immune responses and au- 
tophagy. Rare patients carrying both 
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a C9ORF72 repeat expansion and a TBKI 
loss-of-function mutation have an earlier 
age of onset and faster disease progression. 
To examine the pathological connection 
between these alterations, Shao et al. con- 
ducted a series of experiments in vivo in 
mice expressing (G,C,)n or DPR-encoding 
transgenes through the injection of recom- 
binant adeno-associated virus directly into 
the brain. 

Shao et al. showed that in mice express- 
ing long G,C, repeats, TBK1 was phos- 
phorylated and colocalized in cytoplas- 
mic puncta with three C9ORF72-derived 
DPRs, poly(Gly-Ala) [poly(GA)], poly(Gly- 
Arg), and poly(Pro-Arg). They confirmed 
these findings in postmortem material 
from the frontal cortex and hippocampus 
of CIORF72 FTD patients. Among these 
three DPRs, poly(GA) triggered TBK1 se- 
questration into cytoplasmic inclusions, 


Connecting alterations in ALS-FTD 

The protein products of several amyotrophic lateral sclerosis 
and frontotemporal dementia (ALS-FTD)—causing genes form 
a disease module that affects endolysosomal functions. In 
C9ORF72-linked ALS-FTD, poly(Gly-Ala) [poly(GA)] disturbs this 
module by inhibiting TANK-binding kinase 1 (TBK1). This results 
in impaired endosome maturation and induces TAR DNA binding 
protein 43 (TDP-43) aggregation. 
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likely resulting in a loss of activity toward 
its endogenous targets. Sequestration and 
inactivation of cellular proteins implicated 
in pathogenic processes is a key concept in 
neurodegenerative diseases, and poly(GA) 
inclusions have been shown to trap stalled 
proteasomes and nucleocytoplasmic trans- 
port factors (6). 

To address the consequence of reduced 
TBK1 activity on poly(GA)-induced pathol- 
ogy, Shao et al. virally expressed (GA), in 
a mutant mouse with reduced TBK1 activ- 
ity. (GA),,, alone induces a series of patho- 
logical phenotypes, including rare TDP-43 
inclusions, and this was exacerbated when 
TBK1 kinase activity was reduced. The au- 
thors also observed that (GA),,, mice de- 
veloped defective early and late endosomes 
in cortical neurons containing poly(GA) in- 
clusions, which were more abundant when 
TBK1 activity was reduced. 

TBK1 is involved in endosome 
maturation, and Shao et al. found 
that impaired endosome matura- 
tion led to TDP-43 pathology in vi- 
tro. This is consistent with previous 
results suggesting that the endolyso- 
somal pathway is implicated in the 
turnover of TDP-43 (7). Preliminary 
data report endolysosomal deficits 
leading to TDP-43 pathology in hu- 
man motor neurons lacking TBK1 
activity (8), which complements the 
findings of Shao et al. Moreover, 
FTD-causing variants in progranu- 
lin (GRN) and transmembrane pro- 
tein 106B (TMEMIO06B) are tightly 
linked to lysosomal function and 
TDP-43 pathology in patients. Like 
many heterogeneous nuclear ribo- 
nucleoproteins, TDP-43 shuttles 
rapidly between the nucleus and the 
cytoplasm; how impaired endosome 
maturation leads to the formation of 
TDP-43 inclusions in the cytoplasm 
and nuclear depletion, whatever 
comes first, are still important out- 
standing questions. 

Shao et al. suggest a model in 
which poly(GA) inclusions sequester 
TBK1, leading to its autophosphory- 
lation; although activated by this 
phosphorylation, the sequestration 
of TBK1 into poly(GA) inclusions 
would result in a reduction of its 
activity, disrupt endosome matura- 
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tion, and induce TDP-43 aggregation. This 
would, in turn, impair the role of TDP-43 
in RNA processing, especially through re- 
pression of cryptic exon inclusion in genes 
that are essential for neuronal survival (9, 
10). It is intriguing that reduced expression 
of C9ORF72, which also occurs in repeat 
expansion carriers, promotes interferon- 
mediated innate immune responses by 
inhibiting degradation of stimulator of 
interferon genes (STING) in the lysosomal 
pathway (17). STING oligomerization leads 
to TBK1 phosphorylation. It would be in- 
teresting to untangle the complex effects 
of C9ORF72 haploinsufficiency, poly(GA) 
expression, and TBK1 loss of function on 
neurodegeneration in vivo. 

The study of Shao e¢ al. has a much wider 
conceptual implication for understanding 
the pathogenesis of ALS-FTD and eventu- 
ally the identification of new therapeutic 
targets. To date, more than 20 seemingly 
unrelated genes have been identified as 
causative genes or modifiers of ALS-FTD. 
System-level analysis of products of dis- 
ease genes, disease modifiers, or disease- 
associated proteins shows that they are 
often directly or indirectly connected to 
each other and cluster within the same in- 
teractome neighborhood, forming subnet- 
works or disease modules (12), including in 
neurodegenerative conditions (73). Several 
ALS-FTD-associated proteins effect endo- 
somes or lysosomal functions and form an 
interconnected network, akin to a disease 
module (see the figure). The demonstra- 
tion by Shao et al. that C9ORF72-derived 
poly(GA), TBK1, and TDP-43 act in concert 
to cause ALS-FTD phenotypes is consistent 
with the concept of network perturbation 
in disease (14). Complex interaction with 
cell type-specific factors could also explain 
why neither DPRs, RNA foci, nor CQORF72 
haploinsufficiency correlate with regional 
neurodegeneration and TDP-43 pathology. 
Elucidating the full connectivity of ALS- 
FTD proteins will lead to the identification 
of network hubs that could represent new 
therapeutic targets for both sporadic and 
familial ALS-FTD. 
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MICROBIOTA 


Bacteria can be used to study how communities progress when given different initial conditions. 


Community instability in the 


microbial world 


Miniature ecosystems provide insights into general 


ecological principles 


By Matthias Huelsmann? 
and Martin Ackermann? 


rom the rainforests that sequester 
large amounts of carbon (J) to the 
gut microbiota that play an impor- 
tant role in the health of the host (2), 
ecological communities of all forms 
and sizes serve valuable functions. 
Although stable and diverse activities are 
more likely to be found in communities 
that are stable and diverse (3), it is unclear 
exactly how diversity and stability within 
communities influence each other (4). 
Studying this relationship in large-scale 
ecosystems, such as the rainforest, is often 
unfeasible because of practical limitations. 
On page 85 of this issue, Hu e¢ al. (5) pre- 
sent observations of bacterial communities 
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under highly controlled conditions. They 
found that diverse communities lose sta- 
bility and that this negative effect of diver- 
sity on stability is amplified when species 
in their communities interact strongly. If 
this applies to natural ecological commu- 
nities of different scales, human activities 
that strengthen interactions between spe- 
cies may destabilize certain valuable eco- 
logical functions. 

There are two different aspects of stabil- 
ity in ecological communities: functional 
and compositional. Functional stability 
refers to the variation in the collective 
function of an ecological community upon 
environmental changes, and compositional 
stability refers to the variation in the pop- 
ulation densities of community members 
over time. In the face of accelerating en- 
vironmental changes, finding what deter- 
mines the functional stability of commu- 
nities has never been more relevant (3). 
Functional stability is positively correlated 
with diversity, that is, more-diverse com- 
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munities should lead to functional stabil- 
ity. The intuitive explanation is that diverse 
communities are likely to contain multiple 
species that carry out similar functions but 
are affected differently by environmental 
change, so that the function of a disturbed 
species can be compensated by another 
species (4). The relationship between com- 
positional stability and diversity is less 
clear and depends heavily on how strongly 
species in communities interact. The “com- 
petitive exclusion principle” (6) predicts 
that the number of coexisting species in 
a community cannot exceed the number 
of distinct limiting nutrients. However, 
species within a community can increase 
species diversity beyond this limit through 
interactions. For example, one species can 
release partially processed nutrients that 
can be consumed by another species (7). 
When interactions between species are too 
strong, some models have predicted that 
diverse communities tend to lose compo- 
sitional stability and exhibit fluctuating 
population dynamics (8), although these 
fluctuations can simultaneously promote 
species coexistence (9). 

To put these predictions to the test, a 
system would need to be studied in which 
communities of varying diversity can easily 
be constructed and interaction strengths 
between species can be adjusted with min- 
imum external variation. Hu et al. used 
bacterial communities grown in controlled 
laboratory conditions to construct such a 
system. Specifically, they sought to find out 
how the initial diversity of a community 
and the average strength of interspecies 
interactions would affect a community’s 
final diversity and compositional stability. 

Based on a mathematical model, the 
bacterial communities should fall into 
three distinct phases (see the figure). If a 
community starts with just a few species, 
or if the interspecies interactions are weak, 
all of its species should stably coexist. If 
a community starts with more species, or 
if the interspecies interactions are slightly 
stronger, some of its species should be- 
come extinct while the remaining species 
should stably coexist. And, if a community 
starts with even more species, or if the 
interspecies interactions are even stron- 
ger, more species should become extinct 
and the remaining species should display 
heavily fluctuating population dynamics. 
However, these fluctuations would also 
slow down the rate of species extinctions. 

To evaluate whether this model reflects 
reality, Hu et al. gathered a diverse collec- 
tion of bacterial species and constructed 
communities with different levels of initial 
diversity. They were able to adjust the in- 
terspecies interaction strengths by supple- 
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Phase diagram of an ecological community 

Hu et al. found that bacterial communities exist in three phases, depending on their initial diversity and the 
interaction strengths among species. If these phases are present in larger-scale ecological communities, then 
human activities that increase interspecies interactions could shift communities to the unstable phase. 
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menting the growth medium with differ- 
ent amounts of key nutrients. The authors 
monitored population densities through 
DNA sequencing and community function 
by estimating total biomass over 10 days. 
They observed that the bacterial communi- 
ties did fall into the three phases that were 
predicted by the model. The predicted fluc- 
tuations manifested both in the population 
densities and in the total biomass. Hence, 
communities that were initially very di- 
verse or very strongly interacting tended 
to lose both compositional and functional 
stability. Overall, the findings of Hu et al. 
suggest that ecological communities tran- 
sition through distinct phases as inter- 
species interaction strength or diversity 
changes, similar to how matter transitions 
abruptly through distinct states when tem- 
perature or pressure changes. 

Human activity can transition ecological 
communities through phases and jeopar- 
dize valuable functions. For example, if the 
amount of nutrients in an ecosystem is in- 
creased because of intensive fertilization, 
this may increase the strength of interac- 
tions between species (10) and shift the 
community out of a stable phase of spe- 
cies coexistence and into a phase of com- 
positional instability. Although this loss of 
compositional stability might buffer the 
rate of extinctions, it may be accompanied 
by a loss of functional stability. Therefore, 
reducing the amount of nutrients intro- 
duced to natural ecosystems might be one 


Human activity 
could shift 
community 

stability 


way to safeguard the many valuable func- 
tions provided by ecological communities. 

Bacterial communities are attractive 
systems for studying principles in ecology, 
but it is not certain that these observations 
apply to communities of larger organisms. 
However, Hu et al’s mathematical model 
was not specific to bacterial communities 
and it predicted the three phases across 
a variety of community and interaction 
types. Therefore, the principles described 
by the authors might hold for ecological 
communities across scales. & 
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Why nations lead or lag in energy transitions 


Policy-driven change hinges on institutions that support insulation or compensation 


By Jonas Meckling’, Phillip Y. Lipscy??, 
Jared J. Finnegan**, Florence Metz® 


ussia’s invasion of Ukraine has dis- 

rupted energy markets, producing 

price spikes reminiscent of the 1970s. 

Many suggest that the crisis may ac- 

celerate transitions away from fos- 

sil fuels and reduce greenhouse gas 
(GHG) emissions. Yet, governments have 
responded very differently to the price 
shock. Though some are prioritizing clean 
energy, others are doubling down on fossil 
fuel production. Why do countries respond 
so differently to the same problem? Access 
to domestic fossil fuel resources is only 
part of the story. Countries also vary in the 
political sources that enable transforma- 
tional change in energy and climate policy 
(1, 2). We draw on two historical episodes 
illustrating variation in energy transitions 
across countries—the 1970s oil shocks, and 
policies to address climate change—to offer 
important lessons on the political oppor- 
tunities and constraints for policy-makers 
across different countries to accelerate the 
transition to clean energy. 

Energy transitions impose adjustment 
costs on businesses and consumers, creating 
economic winners and losers (3). Supply-side 
policies, such as fuel economy standards or 
renewable energy deployment standards, 
primarily put visible costs on businesses. 
Demand-side policies, such as gas or carbon 
taxes, impose costs most directly on consum- 
ers. Disadvantaged businesses—such as fossil 
fuel producers and energy-intensive indus- 
tries—have strong incentives to lobby against 
such policies, and consumers may express 
their displeasure by voting against incum- 
bent politicians. Some countries have stron- 
ger institutions to manage such opposition to 
change than others. 
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For example, in the 1970s, countries sought 
to reduce dependence on oil—particularly for 
electricity generation and transportation—in 
response to a global supply shock. However, 
outcomes varied widely (see the figure, top) 
(4). There is similarly substantial variation in 
policies to promote clean energy transitions 
in response to climate change (see the figure, 
bottom) (5). Countries have also taken diver- 
gent paths in their responses to the current 
energy price shock. 

We draw on recent research on energy 
policy in advanced industrialized countries 
to illustrate how they pursued different 
pathways. We propose that, broadly speak- 
ing, governments can pursue energy tran- 
sitions through one of three pathways: in- 
sulation—policy-makers are shielded from 
political opposition; compensation—policy- 
makers ease the burden of adjustment for 
business and consumers; and markets— 
policy-makers step back and markets drive 
change. The first two pathways enable a 
policy-driven approach that gives direction 
to markets and buffers the costs of market 
developments. The third pathway defers 
to market forces to set the pace of change. 
Market-based transitions are often subject 
to volatility, reversal, and price fluctuations. 


INSULATION 
Policy-makers enjoy varying degrees of in- 
sulation from policy backlash depending 
on bureaucratic and electoral institutions. 
Autonomous bureaucracies are characterized 
by strong mandates, high levels of expertise, 
low levels of political appointees, and an ad- 
ministration staffed with elite civil servants 
recruited meritocratically and with an ex- 
pectation of long-term employment. Civil 
servants in such bureaucracies are better in- 
sulated from business and public opposition 
to costly policies than politicians reliant on 
corporate campaign donations and voter sup- 
port. Similarly, proportional electoral rules 
(seats allocated in a legislature proportional 
to votes shares) tend to better insulate politi- 
cians from voter backlash than majoritarian 
rules (“winner-takes-all” whereby a candi- 
date receiving the highest vote share in a dis- 
trict represents the district) (6). 

During the 1970s oil crises, the Japanese 
and French governments substantially mod- 


erated their reliance on oil consumption. The 
Japanese government’s promotion of energy 
conservation and diversification relied on 
the bureaucratic autonomy of the Ministry of 
International Trade and Industry and a rela- 
tively proportional, single nontransferable 
vote, multimember district electoral system 
that allowed politicians to remain secure in 
office despite imposing exceptionally high 
prices for fossil fuel consumption (7). 

In France, despite the country’s majoritar- 
ian electoral system, bureaucratic insulation 
gave the government a relatively free hand 
in the electricity sector. The Commissariat 
4 VEnergie Atomique and state-owned elec- 
tricity utility Electricité de France (EDF) 
operated with a high degree of autonomy in 
implementing the ambitious Messmer Plan 
to transition to nuclear energy. The country 
rapidly expanded nuclear power from 8% of 
electricity generation capacity in 1973 to 70% 
by the mid-1980s. 

France is following a similar playbook in 
response to the gas price shock following 
Russia’s invasion of Ukraine. In February, 
President Macron announced that the coun- 
try would construct up to 14 new-generation 
reactors. Although EDF is no longer state- 
owned, the French government holds a large 
majority stake in the company, which contin- 
ues to insulate it from business opposition 
and grant it a high level of control over the di- 
rection of the country’s electricity sector. The 
French government also announced plans to 
fully renationalize EDF in the face of the en- 
ergy and climate crises. 

By contrast, Japan’s political institutions 
were changed starting in the 1990s: The new 
mixed-member majoritarian electoral sys- 
tem empowers price-sensitive consumers, 
and bureaucratic autonomy has been weak- 
ened considerably. Under this institutional 
configuration, successive Japanese govern- 
ments have struggled to accelerate its clean 
energy transition (8). The country’s response 
to the war in Ukraine has sought to cushion 
the impact for consumers and businesses by 
subsidizing oil wholesalers and maintaining 
economic interests in Russian natural gas 
projects in Sakhalin. 

Insulation can also vary at the subna- 
tional level. California followed a path of 
insulation from political headwinds by 
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delegating regulatory power for 
the clean energy transition to an 
independent government agency 
(9). The powerful California Air 
Resources Board (CARB) steeled 
itself in battles over air pollution. 
It has highly specialized career 
civil servants who cannot be voted 
out of office for adopting costly 
policies. And they have used that 
power. For example, the state’s 
low-carbon transport policies im- 
pose an indirect carbon price of up 
to $1000 per metric ton of carbon 
dioxide equivalent, one of the high- 
est globally. So far, the legislature 
has not touched CARB’s power to 
drive climate and clean energy 
policy. Indeed, the agency may be 
beneficial to elected leaders be- 
cause it can take the blame for any 


Variation in policy responses 


Facing oil price shocks in the 1970s [top, data from (4)] and 


more recently climate change [bottom, data from (5)], countries 
have demonstrated variability in responses to the need for an 


energy transition. 


Reduction in oil demand (%) (1973-1985) 
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governments do not consistently 
outperform democracies in energy transi- 
tions and environmental outcomes (10). 


COMPENSATION 
A compensation path seeks to secure the sup- 
port of businesses and consumers that stand 
to bear the costs of policy change. Political 
institutions affect the feasibility of compen- 
satory policies. Corporatist institutions grant 
enduring, privileged policy-making access to 
major associations representing business and 
labor interests, facilitating stable bargaining 
arrangements. Countries with such institu- 
tions can strike long-term compensatory 
deals that ease the burden of energy transi- 
tions for economic losers. Countries with es- 
tablished welfare state institutions that offer 
generous social safety nets can more credibly 
commit to compensating individuals facing 
economic dislocation and high energy prices 
(2, 11). Many northern European countries, 
such as Denmark, Finland, Germany, the 
Netherlands, and Sweden, have institutional 
endowments that facilitate compensation. 
Germany’s response to the oil crises was 
to ease the transition away from oil through 
compensatory bargaining with industry asso- 
ciations and labor unions. Coal and nuclear 
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energy were expanded through subsidies, 
such as the “coal penny” that was added to 
consumers’ bills. Offering support for both 
industries, instead of picking one, reduced 
political conflict. The government used the 
country’s welfare system to ease the burden 
of higher energy costs for households. 

In Germany’s contemporary clean energy 
transition, policy-makers are relying on a 
similar approach. Successive governments 
have generously subsidized clean technolo- 
gies using revenue raised through increased 
energy prices for consumers, while at the 
same time compensating dirty producers to 
transition away from fossil fuels. The feed-in 
tariff—a subsidy for wind and solar electric- 
ity—has helped to substantially bring down 
the cost of clean technologies in Germany 
and abroad, particularly solar. Politically, 
the feed-in tariff has worked to mobilize a 
broad alliance of farmers, green activists, 
conservatives, and progressives for reform 
(12). To phase out lignite coal, the country 
negotiated a “coal compromise” that pro- 
vides EUR 40 billion to regions with coal 
mining and to coal-fired power stations in 
return for political support for a phase-out. 
Starting in 2023, the government envisages 


to support households to cope with 
increasing energy prices by offering 
a “climate premium.” Countries with 
analogous institutional arrange- 
ments, such as Nordic countries, use 
a similar compensation-based ap- 
proach to energy transitions (2). 

By contrast, countries with weak 
welfare states and pluralist state-busi- 
ness relations, in which many groups 
compete for influence, tend to see 
frequent policy reversals and reliance 
on ad hoc, short-term measures. For 
example, the US Trade Adjustment 
Assistance Program, which seeks 
to mitigate the impacts of trade on 
workers and industry, has faced re- 
peated budget cuts and rule changes, 
including a drastic reduction in 1981 
as utilization soared in the aftermath 
of the oil crises and Japanese indus- 
trial competition. Countries like the 
US tend to lack institutional founda- 
tions to pursue “just transitions” de- 
spite calls to compensate the losers of 
climate policy. 

Developing countries often lack 
resources and established domestic 
institutions for compensation like 
those in welfare states. Here, inter- 
national institutions that provide 
bilateral and multilateral aid and 
other finance streams can _ facili- 
tate compensatory arrangements, 

80 helping producers and consumers 
absorb costs, reducing political op- 
position to energy transition policies. 


MARKETS 

A transition path through markets is ef- 
fectively the absence of policy reform that 
imposes direct costs on producers and con- 
sumers. Instead, governments rely largely 
on markets to transform the energy sector. 
This pathway is common in countries whose 
institutions allow opponents to more easily 
block costly energy policies. In such coun- 
tries, insulation from voters and business 
is limited because of majoritarian electoral 
rules and weak bureaucracies. Compensation 
is difficult owing to small welfare states and 
pluralist state-business relations. Policy re- 
sponses to crises tend to focus on short-term 
stopgap measures and foreign policy solu- 
tions that reduce domestic adjustment costs. 
Countries such as Australia, Canada, the 
United Kingdom, and the United States tend 
to fall into this group. 

After the 1973 oil price shock, efforts by 
the US and Australian governments to fa- 
cilitate policy-driven transitions faltered in 
the face of resistance from opponents that 
stood to bear the costs. For example, gaso- 
line tax hikes floated by the Nixon and Ford 
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administrations as a potential energy conser- 
vation measure faced intense objections from 
congressional legislators concerned about 
electoral and industry backlash. The US 
majoritarian electoral system and presiden- 
tial authority over the federal bureaucracy 
provide limited isolation, and opponents of 
policy-driven transitions can often effectively 
block change. The scope for compensation is 
limited because of pluralist state-business re- 
lations and a weak welfare state. Government 
initiatives to develop alternative energy 
sources in Australia during the 1970s faced 
similar challenges, and the economy’s reli- 
ance on oil remained largely unchanged. The 
government promoted market-based mea- 
sures to encourage oil exploration, including 
import parity pricing to bring domestic oil 
prices in line with international levels. 

The two countries have also struggled to 
promote contemporary clean energy tran- 
sitions. US Vice President Gore’s initiative 
for a British thermal unit (BTU) energy tax 
during the Clinton administration faced 
intense opposition from energy-intensive 
industries and lawmakers concerned about 
reelection. “Getting BTU’d” became an en- 
during warning against similar attempts 
after supportive Democratic legislators 
suffered steep losses in the 1994 midterm 
elections. Australia’s majoritarian electoral 
rules based on preferential instant-runoff 
voting make politicians highly vulnerable 
to voter backlash over energy prices. Prime 
Minister Julia Gillard’s 2012 carbon pric- 
ing scheme led to a sharp decline in sup- 
port for her Labor Party. The issue became a 
centerpiece of Liberal Party opponent Tony 
Abbott’s successful 2013 election campaign. 
Australia promptly became the first country 
in the world to rescind a carbon tax. 

The US and Australia have lacked a stable 
national climate policy. Efforts to reduce en- 
ergy emissions have been enacted in both 
countries only to be reversed by the next 
government (13). The absence of consistent 
energy policies has elevated the role of mar- 
kets. Much of the emissions reductions in the 
US have been the result of a market-driven 
switch from coal to natural gas. 

In the current crisis, the US federal gov- 
ernment’s immediate reaction was to facili- 
tate oil and gas drilling on public land to in- 
crease oil production and bring down market 
prices. Additionally, the US has encouraged 
oil producers such as Saudi Arabia to expand 
production. At the same time, 24 US states 
have moved to reduce fuel taxes for consum- 
ers or are considering doing so. The Morrison 
government in Australia similarly slashed the 
fuel excise tax in half from 44.2 to 22.1 cents 
per liter. These efforts focused on reducing 
disruptive energy price volatility for industry 
and consumers. 
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LESSONS FOR POLICY 

Variation in the ability to adopt costly en- 
ergy transition policies has important 
implications for the options that policy- 
makers have in different country settings. 
First, policy-makers that can in principle 
rely on mechanisms of insulation or com- 
pensation need to purposefully leverage 
both. If they have autonomous agencies, 
they can delegate policy design questions 
to those bureaucracies (9). They also need 
to be sensitive about how to bundle com- 
pensation packages to mobilize political 
support. The compensation needed to bring 
political groups and communities on board 
depends, for example, on how vulnerable 
they are to both costly climate policy and 
the physical impacts of climate change (14). 
Countries that can absorb costly policy in- 
vestments are thus better able to invest in 
the deployment of frontier technologies 
that are not yet cost-competitive with fossil 
fuel technologies. Historically, this included 
wind power technology in Denmark and so- 
lar photovoltaics in Germany. Today, these 
include hydrogen storage; hydrogen fuel 
cells; and carbon capture, use, and storage, 
to name a few. The hope to reduce hard- 
to-abate GHG emissions in sectors such as 
steel, cement, shipping, and aviation (75) 
thus often rests with those countries able to 
pursue policy-driven transitions. Although 
these countries bear the costs of developing 
niche markets for costly technologies, the 
investments can be worthwhile if they lead 
to long-run economic advantages such as 
export industries or cheaper energy inputs. 

Second, countries that tend to pursue 
market-driven transitions rely largely on 
first-mover countries—those with the ca- 
pacity to absorb costly policy action—to 
help bring down the cost of clean technolo- 
gies through policy for follower countries. 
But once clean technologies are cost-com- 
petitive, market-driven transitions can ac- 
celerate rapidly. For example, US adoption 
of solar and wind power remained robust 
even under the Trump administration as the 
cost of renewable power generation contin- 
ued to fall. In this phase, a commitment to 
free market principles can be supportive of 
energy transitions. Governments that lack 
mechanisms of insulation and compensa- 
tion can—at a minimum—support energy 
transitions by easing regulatory barriers to 
the deployment of clean technologies, such 
as simplifying permitting of renewable en- 
ergy plants and grid infrastructure. 

Third, policy-makers that cannot pur- 
sue insulation or compensation can still 
pursue policies whose costs are relatively 
diffuse and less visible, and thus less po- 
litically salient. This relates in particular 
to public investments in research and de- 


velopment (R&D) and clean energy deploy- 
ment. These costs are spread across all 
taxpayers and not directly visible to voters 
and industry as they would be through a 
carbon price or regulation. Clean energy 
R&D funding and tax credits for wind and 
solar have been the one constant in US 
clean energy policy, garnering bipartisan 
support. The recent Inflation Reduction 
Act in the US follows this logic. This ap- 
proach differs from compensation in that 
it offers carrots without sticks and tends 
to be based on ad hoc deals rather than a 
stable long-term bargain. Clean energy tax 
credits in the US, for instance, have ex- 
pired frequently, leading to boom-and-bust 
cycles in renewable energy development. 

Climate laggards are often federal coun- 
tries where states or provinces can take the 
lead in energy transitions. Subnational ju- 
risdictions may have greater institutional 
capacity to pursue policy-driven energy 
transitions than the national government, 
as is the case for California and New York. 
Policy-makers in federal systems can thus 
leverage pockets of insulation or compensa- 
tion in subnational jurisdictions to promote 
clean energy from the bottom-up. 

Different political paths result in clean 
energy transitions at varying paces. This 
should temper our expectations on com- 
mon problems—such as price shocks and 
climate change—mobilizing countries 
across the globe for a clean energy future. 
At the same time, understanding these dif- 
ferences helps us target policy interventions 
more carefully to national opportunities 
and constraints. 
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Toward a more inclusive internet 


Expecting perfection impedes progress in the march 
toward digital equality, argues a legal scholar 


By Dov Greenbaum 


egal scholar Orly Lobel’s The Equality 
Machine is a masterful analysis of the 
many “inequitable dimensions of digi- 
tal existence.” The book consists of 10 
substantive chapters wherein Lobel 
expertly describes both the 
opportunities and the discrimi- 
nation engendered by new tech- 
nologies, particularly artificial 
intelligence (AI). Broadly, these 
chapters focus on the use and 
abuse of technologies in employ- 
ment, social interactions, health 
care, digital assistants, media, 
online dating, sex robots, educa- 


The Equality Machine 


and ability. However, while some digital 
technologies exacerbate harassment, for 
example, providing potential perpetra- 
tors with new opportunities and methods 
of abuse, technology can also be a part of 
the solution, providing virtual spaces for em- 
pathy-building and harnessing algorithms 
that promote prosocial conduct. 
Ultimately, observes Lobel, 
technology is merely a tool that 
“can nudge change” when made 
ethical by design. As such, she 
argues that the idea of equality 
must be systemically prioritized 
in the development and imple- 
mentation of new technologies. 
Large datasets can help us to 


tion, and artificial companions. Se oe identify past, present, and poten- 
In each of these arenas, Lobel 368 pp. tially future discrimination, rec- 


notes how poor outcomes of- 

ten arise not from overt malice but rather 
when technologies either competently rep- 
licate societal shortcomings or are incom- 
petently implemented. 

Technology, notes Lobel, “has the power 
to either mitigate or intensify problematic 
patterns of exclusion,” especially with re- 
gard to gender, race, sexuality, geography, 
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ognize sources of disparities, and 
then work toward rectifying them, argues 
Lobel. “Fairness through awareness” comes 
from acknowledging inherent biases in the 
underlying data while directing techno- 
logical outputs of that data toward more 
equitable outcomes. 

Lobel cautions that there are always 
trade-offs when new technologies are de- 
ployed. Fairness and equality often come 
at social costs, she notes, and technology 
can serve both “to support and to surveil, 
to learn and to manipulate, to heal and to 
harm, to detect and to conceal, to equalize 


Service robots running AliGenie disrupt problematic 
gender norms reproduced in other digital assistants. 


and to exclude.” Technologies designed to 
uncover workplace harassment, for example, 
“can also chill speech and invade privacy.” 

Despite these tensions, Lobel maintains 
that we should eschew pessimistically re- 
jecting innovations that are presumed to be 
problematic. Experimentation and failure 
are, in her opinion, important steps in the 
march toward technological equality, and 
we ought to focus not on whether an algo- 
rithm is flawed but on whether “it is safer, 
fairer, and more unbiased relative to what 
came before it.” 

To attain equality through technology, 
Lobel promotes frameworks that will allow 
researchers to advance ethical outcomes. 
These frameworks would encourage compa- 
nies to seek out more inclusive and diverse 
technology designers, advance thorough 
algorithmic detection of inequalities, seek 
out greater public and regulatory oversight 
for monitoring emerging technologies such 
as AI, and foster corporate compliance with 
equitable ideals by focusing on fairer algo- 
rithmic outcomes rather than restraining 
problematic data inputs. 

But she argues that technologies 
will continue to perpetuate inequal- 
ity as long as industry aims to ex- 
ploit human-generated data for profit. 
As such, Lobel’s frameworks also promote 
legislation that conceives of consumer data 
as a shared resource that should be made 
accessible for nonprofit research and pub- 
lic auditing. 

Lobel advises that the pursuit of broader 
social equality through data analysis will 
often require removing politically correct 
constraints from the data. “The best way to 
prevent discrimination may be to authorize 
an algorithm to consider information about 
gender and race,” she argues. You can’t fix 
what you can’t measure. 

Lobel also advocates for government in- 
vestment in ensuring that consumer data 
capture underrepresented demographics 
and makes the case for the creation of pro- 
gressive public-private infrastructure and 
governance systems to manage and oversee 
the deployment of new technology. Like new 
pharmaceuticals, new algorithms would 
benefit from more comprehensive oversight, 
she argues. 

“Progress supersedes perfection,” insists 
Lobel. Rather than perpetuating the idea 
that technology must meet unattainable 
measures of fairness before we can move for- 
ward, she directs us to leverage the imper- 
fect technologies we already have to make 
the world a better place. @ 
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COVID-19 


Partisanship and the pandemic 


Political polarization shaped attitudes and outcomes 


related to COVID-19 


By Matthew S. Levendusky 


arly in the COVID-19 pandemic, 
the public learned the term 
“comorbidities’*—a word that denotes 
the factors that make people more 
susceptible to poor disease outcomes. 
In their new book, Pandemic Politics, 
political scientists Shana Kushner Gadar- 
ian, Sara Wallace Goodman, and Thomas 
Pepinsky show that partisanship was a 
COVID-19 comorbidity in the United States. 
The pandemic has been deadlier, and more 
contentious than it otherwise would have 
been, they argue, because it became a political 
issue, beyond just a public health emergency. 

Drawing on a six-wave panel dataset span- 
ning the first 13 months of the pandemic, the 
authors document how partisanship shaped 
public attitudes toward this crisis in the 
United States. Democrats and Republicans, 
they note, took sharply divergent positions 
on nearly all aspects of the pandemic, from 
masking, to stay-at-home orders, to concern 
about the virus, to views of the economy, 
to vaccines. They show that partisanship— 
more so than other any variable—best ex- 
plains pandemic attitudes. 

The authors argue that this partisan polar- 
ization stems from the actions of one man: 
Donald Trump. Trump’s actions throughout 
the pandemic consistently signaled that it 
was not a serious threat: He called to re- 
open the economy in the spring of 2020, he 
publicly disputed Anthony Fauci and other 
scientific experts, he held large in-person 
campaign rallies without requiring masks or 
social distancing, and he even downplayed 
the virus after it sent him to the hospital. This 
stood in stark contrast with the attitudes and 
behaviors of Joe Biden and other Democratic 
politicians, giving citizens a clear signal 
about how pandemic attitudes mapped onto 
partisanship and leading to the polarization 
the authors so carefully document. 

The book will likely become the definitive 
record of how partisanship in the US shaped 
attitudes during an incredibly consequen- 
tial period of history. That said, however, 
there are two areas where I wish the authors 
would have pushed further. 


The reviewer is at the Department of Political Science, 
University of Pennsylvania, Philadelphia, PA 19104, USA. 
Email: mleven@upenn.edu 
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The first has to do with the actors who were 
responsible for infusing partisanship into 
the country’s pandemic response. Few would 
quibble with the argument that Trump’s ac- 
tions were the casus belli of the pandemic’s 
politicization and subsequent polarization. 
But it was state governors who issued stay- 
at-home orders, and their decisions differed 
greatly from state to state. And, as Trump’s 
team noted, Fauci all but endorsed Joe 
Biden in the run-up to the election. Mean- 
while, pro-masking advocates—as much as 
anti-maskers—turned facial coverings into 
a political symbol (J). Further, US public 
health agencies—most notably the Centers 


is \ 
Masking went from public health measure to political act during the COVID-19 pandemic. 


for Disease Control and Prevention and 
the Food and Drug Administration—made 
several highly publicized errors during the 
pandemic, which damaged their credibility, 
at least with some sectors of the public. 
Second, no one would dispute the book’s 
argument that partisanship is one of the 
predictors, if not the best predictor, of at- 
titudes toward the pandemic. Indeed, sev- 
eral published papers—including some by 
the book’s authors—have documented this 
finding. But in a book that runs to more 
than 300 pages, readers will want to see 
more-subtle explorations of partisanship’s 
effects. If partisanship is a social identity, 
for example, then its strength should vary 
over time according to cues from highly vis- 


Pandemic Politics 

Shana Kushner Gadarian, 
Sara Wallace Goodman, and 
Thomas B. Pepinsky 
Princeton University Press, 
2022. 400 pp. 
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ible actors and from the broader political 
environment. Although many of the book’s 
graphs show patterns over time, the authors 
could have done more to unpack how the 
factors they discuss related to this variation. 

The pandemic’s severity also waxed and 
waned across space and time. Social iden- 
tity theory and motivated reasoning would 
predict that these variations would alter 
the effects of partisanship—but did they? 
It may be that these effects only mattered 
early on and that attitudes simply hard- 
ened later to the point where they were 
largely unaffected by the pandemic’s peaks 
and lulls (2). But even if that was the case, 


it would have been valuable to discuss such 
results here, as these non-effects are im- 
portant in light of the underlying theories. 

These issues aside, readers will appre- 
ciate the care that went into this work as 
well as the depth of the authors’ findings, 
which highlight just how extensively parti- 
sanship shaped the public’s response to the 
COVID-19 pandemic. This book is destined 
to end up on the shelf of anyone interested 
in public health and public opinion. 
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PROTEIN DESIGN 
Deep learning takes on protein design 


eep learning approaches such as AlphaFold and Rosettafold 
have made reliable protein structure prediction broadly acces- 
sible. For the inverse problem, finding a sequence that folds to 
a desired structure, most approaches remain based on energy 
optimization. In two papers, a range of protein design problems 
were addressed through deep learning methods. Dauparas et al. 
built on recent deep learning protein design approaches to develop a 
method called ProteinMPNN. They validated designs experimentally 
and showed that ProteinMPNN can rescue previously failed designs 


METALLURGY 
Alittle expansive 


Invar alloys have extremely low 
thermal expansion, making 
them attractive for several types 
of applications. Finding these 
types of alloys in a complex 
compositional space, however, 
is challenging. Rao et al. used an 
iterative scheme that combines 
machine learning, density func- 
tional theory, experiments, and 
thermodynamic calculation 

to find two new invar alloys 

out of millions of candidates 
(see the Perspective by Hu and 
Yang). The alloys are both 


compositionally complex, 
high-entropy materials, thus 
demonstrating the power of this 
approach for materials discov- 
ery. —BG 
Science, abo4940, this issue p. 78; 
see also ade5503, p. 26 


MONKEYPOX 
Sexually associated 
outbreaks 


Monkeypox cases have 
occurred sporadically around 
the world for several decades. 
However, May of 2022 saw hun- 
dreds to thousands of reports 


36 7 OCTOBER 2022 » VOL 378 ISSUE 6615 


made using Rosetta or AlphaFold. Wicky et al. started from a random 
sequence and used a Monte Carlo sequence search coupled with 
structure prediction by AlphaFold to design cyclic homo-oligomers. 
Although the designs were generated to achieve stable expres- 

sion, the sequences had to be regenerated using ProteinMPNN. 

This approach allowed for the design of a range of experimentally 
validated cyclic oligomers and paves the way for the design of 
increasingly complex assemblies. —VV 

Science, add2187, add1984, this issue p. 49, p. 56 


CANCER GENOMICS 

Not just any 
polymorphism 

It is common to find polymor- 
phisms, or variations in the gene 
sequence, that correlate with 
risks of various diseases, but 
such correlations do not indicate 
that the genetic changes actually 
cause disease. Yanchus et al. 
focused on a specific single- 
nucleotide polymorphism that is 
correlated with increased risk of 
one subtype of brain tumors. The 
authors performed detailed anal- 
ysis of the gene locus where this 
polymorphism is located and 


outside of endemic regions. 
Despite its not being a clas- 
sical sexually transmitted 
disease, the outbreak outside 
of Africa is concentrated in 
the community of men who 
have sex with men (MSM). 
Endo et al. found substantial 
transmission among the MSM 
community and estimate high 
values for a basic reproduc- 
tion number (R,) greater 
than 1.0 within the context 
of an MSM sexual network. 
Fortunately, the smallpox 
vaccine is effective against 
monkeypox. —CA 

Science, add4507, this issue p. 90 
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then created mouse models that 
enabled them to directly confirm 
its pathogenic effects. They also 
used mouse models and human 
tumor samples to demonstrate 
the effects of the polymorphism 
on gene transcription patterns, 
uncovering a mechanism for its 
tumorigenic effects. —YN 
Science, abj2890, this issue p. 68 


TRANSCRIPTION 
When transcription meets 
the +1 nucleosome 


Eukaryotic transcription initia- 
tion starts with the assembly 
of a preinitiation complex (PIC) 
on core promoters covering 
transcription start sites (TSS). 
The +1 nucleosome, the first 
downstream nucleosome from 
aTSS, consists of characteristic 
epigenetic marks and is com- 
monly thought to be a barrier 
to early transcription. Chen 
et al. determined the struc- 
tures of nucleosome-bound 
PIC-Mediator. Structural and 
biochemical analyses showed 
that the nucleosome makes 
multiple contacts with PIC- 
Mediator and may enhance 
transcription activity if the 
nucleosome is positioned 40 or 
50 base pairs (but not 70 base 
pairs) downstream of TSS. This 
study provides a structural basis 
for understanding the regulatory 
function of the +1 nucleosome in 
transcription initiation. —DJ 
Science, abn8131, this issue, p. 62 


T CELLS 
Bedtime for Tiegs 


Regulatory T cells (Tyegs) in vis 
ceral-adipose tissue (VAT) are key 
to regulating local and systemic 
metabolism. Circadian rhythm 
pathways are up-regulated in 
tissue T,.g,, but it is unclear how 
they affect VAT T.,... Xiao et al. 
determined the transcriptomic 
and metabolic profiles of VAT T,,.. 
at various circadian time points 
that expressed or were deficient 
in genes that control circadian 
rhythms. VAT T,,,, had altered 
phenotypes at various times dur- 
ing the circadian cycle. Ablation 
of acore clock gene led to VAT T. 


reg 


constitutive activation, resulting 


SCIENCE science.org 


in altered metabolism, fitness 
loss, and greater suppression of 
adipocyte lipolysis. —DAE 

Sci. Immunol. 7, eabl7641 (2022). 


CANCER 
PLK1 moves beyond 
mitosis in lung cancer 


Increased levels of the mitotic 
kinase PLK1 in various cancers 
correlates with poor prognosis. 
Kong et al. identified a non- 
mitotic mechanism through 
which PLK1 fuels tumor growth. In 
Kras-mutant mouse lung adeno- 
carcinoma cells, PLK1 kinase 
activity increased the expression 
of RET, which encodes a recep- 
tor tyrosine kinase. Together, 
KRAS and RET activated the 
MAPK pathway, which promotes 
tumor growth. Combining clini- 
cally approved RET and MAPK 
pathway inhibitors induced tumor 
regression and prolonged survival 
ina mouse model of PLK1- 
overexpressing lung cancer. —LKF 
Sci. Signal. 15, eabj4009 (2022). 


COMMUNITY ECOLOGY 
Predicting microbial 
community dynamics 


Ecological communities are 
complex, and their dynamics are 
difficult to predict. Hu et al. used 
experimental bacteria communi- 
ties to show that species richness 
and the strength of species 
interactions drive community 
dynamics (see the Perspective 
by Huelsmann and Ackermann), 
as predicted by theory. By 
manipulating species pool size 
and nutrient availability (which 
affects species interactions), 
the authors found that more 
complex communities were less 
stable over time. Furthermore, 
communities showed distinct 
phases, shifting from full species 
coexistence to partial coexis- 
tence to dynamic fluctuations as 
species pool size or interaction 
strength increased. Communities 
that fluctuated maintained higher 
species richness, suggesting that 
diversity and stability promote 
one another. —BEL 

Science, abm7841, this issue p. 85; 

see also ade2516, p. 29 
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DEVELOPMENT 


Edited by Caroline Ash 
and Jesse Smith 


Modeling mammalian embryo assembly 


ow the mammalian embryo self-assembles correctly is 
mysterious, in part because the embryo is inaccessible 
after implantation. This stage of development is our big- 
gest bottleneck in life, and most embryos do not survive 
implantation. Postimplantation embryo-like structures 
have been assembled in vitro from a mixture of three stem 
cell types: embryonic stem cells, trophoblast stem cells, and 
extra-embryonic stem cells. To gain further insight into the 
mechanisms involved, Bao et al. undertook a combination of 
biophysical measurements, mathematical modeling, and func- 
tional perturbations of natural and stem cell-derived mouse 
embryos. They studied events in cadherin cell-cell adhesion 
that drive sorting in natural embryos before and after implanta- 
tion. Model embryos bypassed implantation but still followed a 
similar set of adhesion events during assembly and consolida- 
tion. The efficiency of sorting and self-assembly of embryo-like 
structures was increased by standardizing cell-type-specific 
cadherin expression. —SMH _ Nat. Cell Biol. 24, 1341 (2022). 


A human blastocyst depicted as it initiates implantation, a process 
that conceals it from observers trying to understand this vulnerable 


early stage of pregnancy. 


METABOLISM 
Sugar-microbiome- 


immune axis 


Sugar in the context of a high-fat 
diet may act indirectly through 
effects on the microbiome and 
intestinal immune cells. Kawano 
et al. investigated the roles of type 
3 innate lymphoid cells (ILC3s) 
and T helper 17 (T,,17) cells, 


which produce interleukin-17. The 
authors found that ingested sugar 
selected for intestinal bacterial 
species that outcompete species 
of bacteria that would stimulate 
T,,17 cell proliferation. The trouble 
is that T,,17 cells also inhibit lipid 
uptake in mice ona high-fat, 
high-sugar diet, so their deple- 
tion is metabolically problematic 
for the host. Removing sugar 
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GEOCHEMISTRY 


Deep-rooted change 


The appearance of land sts 
(such as the ones in this 


illustration of the Silurian) 
affected the composition of 
continental crust. 


OY Lay 


he appearance of land plants over 400 million years ago dramatically shaped the surface 
and atmosphere of Earth. Spencer et al. suggest that plants may have also had an impact 
on the composition of the continental crust. Using chemical tracers in the mineral zircon 
formed in subduction zones, the authors found a substantial change in weathering that 
affected the composition of subducted sediments at that time. This process in turn 
altered the composition of continental crust formed by the deep melting and eruption of these 
sediments. —BG_ Nat. Geosci. 15, 735 (2022). 


from high-fat diets prevents 
mice from becoming obese. The 
complex interplay between diet 
and immune responses to the gut 
microbiota is further complicated 
by the spectrum of microbial 
community compositions found 
among human beings. —LBR 
Cell 10.1016/ 
j.cell.2022.08.005 (2022). 


NEUROSCIENCE 
Memory during 
wakefulness 


The significance of sleep for 
long-term memory formation is 
well recognized. But what about 
the waking state? Does the 
wakeful mode of brain activity 
also promote memory consoli- 
dation? Using a novel object 
recognition task, Sawangijit et al. 
compared long-term memory 
formation in rats that had slept or 
remained awake during a critical 
2-hour period after encoding 
the memory. Remote novel 


object recognition memory was 
assessed a week later. Unlike 
sleep-dependent consolidation, 
wake consolidation strengthened 
a context-independent repre- 
sentation of objects and was 
independent of hippocampal 
function. Therefore, the brain's 
state of wakefulness is associ- 
ated with a distinct mode of 
long-term memory formation 
that is partially associated with 
different memory traces. —PRS 


Proc. Natl. Acad. Sci. U.S.A. 
119, e2203165119 (2022). 


ORGANIC SYNTHESIS 
Easier access to “MnCl,” 


Manganese (Ill) chlorides are 
useful for dichlorination of 
double bonds, but accessing this 
species in solution as a stable, 
stoichiometric reagent has been 
challenging. Saju et al. report 
aroom temperature synthesis 
and structural characterization 
of [MnCl,(OOPh,),] (where Ph 

is phenyl) for such reactions. 
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This compound was reported 

in 1976, but at that time, it was 
made through a low-temperature 
route. In the present synthesis, 
Mn(OAc),, KMnO,, and Me,SiCl 
were combined in a 4:1:16 ratio 
(where Ac is acetate and Me is 
methyl) and then treated with 
eight equivalents of Ph,PO. X-ray 
crystallography of the deep blue 
crystals revealed a rare, five- 
coordinate Mni(lll) structure. 

This compound was used for 
dichlorination of several types of 
alkenes in a wide range of organic 
solvents without acid co-addi- 
tives. —PDS 


J.Am. Chem. Soc. 144, 16761 (2022). 


CANCER 


A potential combination 
Glioblastoma is a highly 
invasive tumor with poor 
prognosis. Tumor vasculature 
and autophagy are important 
determinants for glioblastoma 
progression. Chryplewicz 

et al. used mouse models 


of gliomagenesis to explore 
inhibitors of vascular endothelial 
growth factor receptor (VEGFR). 
A combination of VEGFR inhibitor 
with the tricyclic antidepressant 
imipramine, which increases 
autophagy, significantly delayed 
tumor growth. It is possible that 
the therapy-induced increase in 
autophagy is immunogenic and 
drives antitumor immunity. The 
authors observed increased CD8* 
and CD4*T cell infiltration, which 
was dependent on autophagy 
in glioblastomas. They also 
found that imipramine altered 
the phenotype of macrophages 
in the glioblastoma microenvi- 
ronment, skewing them away 
from immunosuppression. The 
drug combination appeared to 
remodel the tumor microenviron- 
ment to favor antitumor T cell 
responses, so these responses 
might be further enhanced by the 
addition of immune checkpoint 
inhibitors. —-GKA 
Cancer Cell 10.1016/ 
j.ccell.2022.08.014 (2022). 


OPTOMECHANICS 
Cooling optically 
bound pairs 


The ability to trap, levitate, and 
manipulate microscopic particles 
with beams of laser light provides 
access to exquisitely sensitive 
force measurements. Increasing 
the number of trapped particles 
and then being able to sense 
the interactions between them 
is expected to enhance that 
sensitivity and enter the regime 
of mechanical quantum systems. 
Arita et al. have developed an 
optical tweezer system that can 
trap a pair of dielectric micro- 
spheres and cool one of them to 
sub-Kelvin temperatures through 
a process of sympathetic cooling. 
The two microparticles are opti- 
cally bound, with the interaction 
strength dependent on the sepa- 
ration between the traps. Cooling 
to even lower temperatures, 
possibly with arrays of trapped 
microparticles, may help to real- 
ize exotic proposals to measure 
such things as quantum friction, 
dark matter, and quantum grav- 
ity. -ISO 

Optica 9,1000 (2022). 
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CELL BIOLOGY 
LYSET helps load 
lysosomes 


Lysosomes are major degra- 
dative compartments within 
the cell, and their dysfunction 
results in both rare and com- 
mon disorders. Certain viruses, 
including severe acute respira- 
tory syndrome coronavirus 2 
(SARS-CoV-2), hijack lysosomes 
to gain entry into the cell and 
start their destructive infection 
cycle. Richards et al. identified a 
small protein named LYSET that 
is critical for proper lysosomal 
function. In cells lacking LYSET, 
the trafficking of enzymes to 
the lysosomes was severely 
disrupted, resulting in the 
accumulation of undigested 
material in the lysosome. 
Independently, Pechincha et 
al. identified LYSET as being 
selectively essential when cells 
feed on extracellular proteins. 
Cancer cells commonly rely on 
extracellular proteins to provide 
amino acids. LYSET helped to 
anchor N-acetylglucosamine- 
1-phosphotransferase in Golgi 
membranes for tagging enzymes 
with the lysosomal trafficking 
signal mannose-6-phosphate. 
Without LYSET, lysosomes were 
depleted of catabolic enzymes, 
losing their ability to digest 
extracellular proteins. —SMH 
Science, abn5637, abn5648, 
this issue p.39, p.40 


DEVELOPMENT 
Waking up the genome 
in human embryos 


How zygotic genome activation 
(ZGA), the first gene expression 
event in life, is started in humans 
remains poorly understood. ZGA 
depends on translational activi- 
ties at the preceding stages. 

Zou et al. profiled the transla- 
tome and transcriptome from 
the same samples of human 
oocytes and early embryos. 
Comparison with corresponding 
data from mice revealed both 
conserved and human-specific 


SCIENCE science.org 


translational activities during 
the oocyte-to-embryo transition. 
The authors identified the TPRX 
transcription factor family, which 
includes TPRXL, a maternally 
inherited factor, and TPRX1/2, 
two early transcribed transcrip- 
tion factors after fertilization, as 
being critical regulators for ZGA 
and embryonic development in 
humans. —DJ 

Science, abo7923, this issue p. 41 


CORONAVIRUS 


Surveillance across Africa 
The past 2 years, during which 
waves of severe acute respira- 
tory syndrome coronavirus 2 
(SARS-CoV-2) variants swept 
the globe, have starkly high- 
lighted health disparities across 
nations. Tegally et al. show how 
the coordinated efforts of tal- 
ented African scientists have ina 
short time made great contribu- 
tions to pandemic surveillance 
and data gathering. Their efforts 
and initiatives have provided 
early warning that has likely ben- 
efited wealthier countries more 
so than their own. Genomic 
surveillance identified the emer- 
gence of the highly transmissible 
Beta and Omicron variants and 
now the appearance of Omicron 
sublineages in Africa. However, 
it is imperative that technology 
transfer for diagnostics and 
vaccines, as well the logistic 
wherewithal to produce and 
deploy them, match the data- 
gathering effort. —CA 

Science, abq5358, this issue p. 42 


MICROBIOLOGY 
Informing bacterial 
spores 


Bacterial spores can spend years 
in a dormant, biochemically 
inactive state, yet they retain 

the ability to process informa- 
tion from cues that can release 
them from dormancy and allow 
them to undergo germination. 
Kikuchi et a/. present a mathe- 
matical model and experimental 
evidence that bacterial spores 


use an electrochemical poten- 
tial caused by a gradient of 
potassium across the spore 
membrane to sense their envi- 
ronment (see the Perspective 
by Lombardino and Burton). 
Pulses of nutrient, which can 
combine to release spores from 
dormancy, cause changes in the 
electrochemical potential that 
could be integrated over time to 
allow the spore to monitor such 
cues with a mechanism that 
does not require energy produc- 
tion in the dormant spore. —LBR 
Science, abl7484, this issue p.43; 
see also ade3921, p.25 


NEURODEGENERATION 
Endosomal changes 
tied to disease 


Frontotemporal dementia and 
amyotrophic lateral sclerosis 
share key genetics and pathol- 
ogy, but the connection between 
different known facets of their 
disease biology is not always 
clear. Shao et al. discovered an 
interplay between the disease- 
associated genes C9orf72 and 
TBK1. Large repeats of glycine- 
alanine, which are produced by 
an expansion in C9orf72, seques- 
tered TBK1 into inclusions, 
inhibiting its function and impair- 
ing the downstream endosomal 
pathway (see the Perspective by 
Gallo and Edbauer). A mutation 
in TBK1 worsened these defects, 
enhancing disease phenotypes 
in mice. Remarkably, the disrup- 
tion of the endosomal pathway 
also proved sufficient to induce 
the aggregation of TAR-DNA 
binding protein 43 (TDP-43), 
a key driver of degeneration in 
these diseases. —-SMH 

Science, abq7860, this issue p. 94; 

see also ade4210, p. 28 


CANCER 


SNPing out metastasis 
Roughly 50% of all patients with 
lung cancer develop metastatic 
disease, but the risk factors for 
predisposition to the develop- 
ment of metastases is unknown. 
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Using sequencing data from 
acohort of patients with lung 
adenoma, Liu et al. identified 

a single nucleotide polymor- 
phism (SNP) in the gene Breast 
Cancer Metastasis Suppressor 1 
(BRMSI1) that is associated with 
increased risk for lung adenoma 
metastases. They further 
investigated this SNP in cell lines 
and patient-derived xenografts, 
demonstrating an increase in 
c-fos that was correlated with 
increased aggressiveness of 
tumors. The authors observed 
that treatment with a c-fos 
inhibitor, tetracycline, reduced 
metastasis in SNP-bearing mice, 
suggesting a promising treat- 
ment that will require further 
clinical validation. -DLH 

Sci. Transl. Med. 14, eabo1050 (2022). 


CORONAVIRUS 
Broadly neutralizing 
from the start 


Recent severe acute respira- 
tory syndrome coronavirus 2 
(SARS-CoV-2) variants have 
shown increasing levels of 
immune evasion, with many 
mutations occurring within the 
receptor-binding domain of 
the spike protein. Kumar et al. 
isolated several human mono- 
clonal antibodies from the B 
cells of individuals in India who 
recovered from infection by the 
ancestral WA.1 virus. Using live 
viral isolates, the authors found 
potent neutralization of the 
Alpha, Beta, Gamma, Delta, and 
Omicron sublineages. Further 
studies showed that one of the 
antibodies targets a conserved 
epitope on the outer face of 
the spike protein trimer. Such 
antibodies may be useful for the 
development of broadly neutral- 
izing antibody therapeutics for 
SARS-CoV-2 variants. —SJW 
Sci. Adv. 10.1126/ 
sciadv.add2032 (2022). 
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CELL BIOLOGY 


Lysosomal enzyme trafficking factor LYSET enables 
nutritional usage of extracellular proteins 


Catarina Pechincha, Sven Groessl{, Robert Kalist, Melanie de Almeida, Andrea Zanotti, 
Marten Wittmann, Martin Schneider, Rafael P. de Campos, Sarah Rieser, Marlene Brandstetter, 
Alexander Schleiffer, Karin Miiller-Decker, Dominic Helm, Sabrina Jabs, David Haselbach, 
Marius K. Lemberg, Johannes Zuber*, Wilhelm Palm* 


INTRODUCTION: Mammalian cells are surrounded 
by a range of different nutrients, including 
amino acids and extracellular proteins. In 
nutrient-rich conditions, cells prefer to im- 
port free amino acids to meet their nutritional 
demands. However, most amino acids in cir- 
culation and in the extracellular space are 
contained within proteins. Cells can ingest 
proteins from the environment and deliver 
them to lysosomes—organelles with digestive 
enzymes that break proteins down into their 
constituent amino acids. By generating an intra- 
cellular nutrient source, lysosomes can sustain 
cellular functions during starvation. This pro- 
cess is commonly exploited by cancer cells, which 
can feed on extracellular proteins to thrive in 
poorly vascularized, nutrient-poor tumors. How- 
ever, the molecular pathways that enable cells to 
use extracellular proteins as a nutrient source 
remain incompletely understood. 


RATIONALE: We set out to identify genes that 
are essential for survival and growth when 
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cells rely on extracellular proteins as nutrients. 
Genetic screens have been instrumental in 
functionally characterizing genes in mamma- 
lian cells and identifying genes that become 
essential in specific cancer contexts. However, 
such screens have commonly been performed 
in cell culture media that provide most amino 
acids at supraphysiological levels while being 
strongly depleted in extracellular proteins. Con- 
ceivably, such unphysiological nutrient mixtures 
enforce metabolic activities that differ from 
in vivo phenotypes. To address this, we devel- 
oped screening conditions where cancer cells 
grow either by the import of free amino acids 
or by the uptake and lysosomal degradation of 
extracellular proteins. 


RESULTS: Through CRISPR screens in defined 
nutrient environments, we identified LYSET, 
a transmembrane protein (TMEM251) selec- 
tively required for cell survival and growth 
when extracellular proteins were an obliga- 
tory amino acid source. Mechanistically, we 
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LYSET enables lysosomal nutrient generation. LYSET anchors GlcNAc-1-phosphotransferase (GIcNAc-PT) 
in the Golgi for tagging catabolic enzymes with the lysosomal trafficking signal, mannose-6-phosphate (M6P). 
The generation of catabolically active lysosomes enables cells to acquire amino acids through the breakdown of 
extracellular proteins. Loss of LYSET abrogates the mannose-6-phosphate pathway, depletes lysosomes of 
catabolic enzymes, and blocks the survival and growth of cells that rely on extracellular proteins as nutrients. 
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characterized LYSET as a protein that an- 
chors GlcNAc-1-phosphotransferase in Golgi 
membranes for tagging catabolic enzymes with 
the lysosomal trafficking signal, mannose-6- 
phosphate. GlcNAc-1-phosphotransferase stability 
depended on LYSET because of its transmem- 
brane domain, which was found to contain mul- 
tiple hydrophilic amino acid residues and to 
co-occur with LYSET in multicellular animals. 
Without LYSET, the GlcNAc-1-phosphotransferase 
transmembrane domain was unstable, which 
led to degradation of the protein. Consequently, 
catabolic enzymes did not reach the lysosome 
and were instead mistrafficked to the cell sur- 
face. LYSET-deficient cells were unable to gen- 
erate nutrients through lysosomal breakdown 
of proteins and accumulated lysosomes that 
were filled with undigested cargo. Although 
LYSET-deficient cancer cells grew normally 
under nutrient-rich conditions, they failed to 
grow in amino acid-poor environments and 
displayed a severely reduced ability to form 
tumors in mice. 


CONCLUSION: Our results identified LYSET as a 
core component of the mannose-6-phosphate 
pathway for lysosomal enzyme trafficking. A 
clue for the function of LYSET came from our 
discovery that GlcNAc-1-phosphotransferase 
contains an energetically unfavorable trans- 
membrane domain, which was predicted to 
depend on LYSET for stable membrane in- 
tegration. The co-occurrence of LYSET and the 
unstable GlcNAc-1-phosphotransferase trans- 
membrane domain in the same organisms 
suggests that they became functionally linked 
during evolution of the mannose-6-phosphate 
pathway. Conceivably, controlling GlcNAc-1- 
phosphotransferase levels through LYSET con- 
stitutes a mechanism to regulate lysosomal 
enzyme trafficking. LYSET is relevant for 
several human pathologies: LYSET-deficient 
cells lack a functional mannose-6-phosphate 
pathway, which provides a mechanistic ex- 
planation for the association of LYSET muta- 
tions with hereditary syndromes that resemble 
the lysosomal storage disorders mucolipidosis 
II and II. Moreover, LYSET enables cancer 
cells to exploit extracellular proteins as a nu- 
trient source, thereby gaining metabolic flex- 
ibility and resilience. Thus, inhibiting LYSET 
and the lysosomal enzyme trafficking pathway 
might be a promising strategy to suppress a 
key metabolic adaptation in cancer. 
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Mammalian cells can generate amino acids through macropinocytosis and lysosomal breakdown 

of extracellular proteins, which is exploited by cancer cells to grow in nutrient-poor tumors. Through 
genetic screens in defined nutrient conditions, we characterized LYSET, a transmembrane protein 
(TMEM251) selectively required when cells consume extracellular proteins. LYSET was found to associate 
in the Golgi with GlcNAc-1-phosphotransferase, which targets catabolic enzymes to lysosomes 

through mannose-6-phosphate modification. Without LYSET, GlcNAc-1-phosphotransferase was unstable 
because of a hydrophilic transmembrane domain. Consequently, LYSET-deficient cells were depleted 
of lysosomal enzymes and impaired in turnover of macropinocytic and autophagic cargoes. Thus, 
LYSET represents a core component of the lysosomal enzyme trafficking pathway, underlies the 
pathomechanism for hereditary lysosomal storage disorders, and may represent a target to suppress 


metabolic adaptations in cancer. 


hen nutrients are abundant, mamma- 

lian cells preferentially meet their de- 

mands for exogenous amino acids by 

importing monomeric amino acids 

through plasma membrane transport- 
ers. However, most extracellular biomass in 
circulation and the extracellular space is con- 
tained within proteins, which can be broken 
down into amino acids intracellularly through 
endocytosis and lysosomal catabolism (1). By 
tapping into the copious nutrient stores of 
extracellular proteins, cells sustain a supply of 
bioenergetic and biosynthetic precursors dur- 
ing periods of starvation. Macropinocytosis, 
a nonselective endocytic pathway, and lyso- 
somal catabolic activity are up-regulated by 
multiple oncogenic signaling pathways and 
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are frequently elevated in cancer (2, 3). The 
resulting increase in uptake and lysosomal 
breakdown of extracellular proteins consti- 
tutes an alternative amino acid source that 
enables malignant cells to grow in nutrient- 
poor tumor environments (4-6). 

The nutrient environment profoundly influ- 
ences the activities and traits of mammalian 
cells (7). Genetic screens have been instrumen- 
tal in functionally annotating mammalian 
genes and identifying context-specific depen- 
dencies in cancer cell lines, but such screens are 
commonly performed under unphysiological 
nutrient conditions. For example, the balance 
between monomeric and protein-contained 
essential amino acids in standard cell culture 
media is skewed by one to two orders of mag- 
nitude compared with human plasma (8, 9). 
As a corollary, the molecular pathways that 
support a metabolic state in which cells sur- 
vive and grow by using extracellular proteins 
as nutrients remain incompletely understood. 
In this study, through proliferation-based CRISPR. 
screens in defined nutrient conditions, we iden- 
tify the transmembrane protein LYSET as a 
core component of the lysosomal enzyme traf- 
ficking pathway that is critical for lysosomal 
physiology, nutritional usage of extracellular 
proteins, and metabolic adaptations in cancer. 


Results 

A CRISPR screen identifies LYSET (TMEM251) 
as selectively required when cancer cells feed 
on extracellular proteins 


To identify cellular pathways that are required 
for nutrient generation from extracellular pro- 
teins, we established screening conditions 
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under which cell growth was supported either 
by the uptake of monomeric amino acids or by 
macropinocytosis and lysosomal catabolism 
of extracellular proteins (Fig. 1A). To selec- 
tively enforce either nutrient-acquisition path- 
way, the essential amino acid leucine was 
provided in its monomeric form or contained 
within albumin, the major human plasma pro- 
tein (Fig. 1B). As the primary screening model, 
we chose MIA PaCa-2 cells, which are derived 
from pancreatic adenocarcinoma—a nutrient- 
deficient tumor type in which extracellular 
proteins constitute an important amino acid 
source (4-6). When albumin was supplemented 
at physiological levels (3 to 4%), MIA PaCa-2 
cells could sustain proliferation in leucine- 
deficient media through lysosomal albumin 
catabolism (fig. S1A). 

To initiate gene editing at the onset of nutrient 
deprivation, we engineered a single cell-derived 
MIA PaCa-2 clone harboring doxycycline-inducible 
Cas9 (iCas9) (fig. SIB) (10). This cell line did not 
display basal Cas9 activity, rapidly triggered 
CRISPR editing upon doxycycline addition, and 
robustly proliferated using albumin as a leucine 
source (fig. S1, C to E). MIA PaCa-2 iCas9 cells 
were transduced with the Vienna genome-wide 
single-guide RNA (sgRNA) library (77) and, 
upon Cas9 induction, were passaged in four 
media conditions that were either rich or de- 
ficient in leucine and/or albumin (Fig. 1C and 
fig. SIF). By quantifying changes in sgRNA 
representation in the different culture condi- 
tions over time, we identified multiple genes 
that were selectively required when albumin 
was an obligatory nutrient (data S1 and S2). 
These included endolysosomal trafficking reg- 
ulators, amino acid transporters, and compo- 
nents of the amino acid-sensing mTORC1 and 
GCN2-ATF4 pathways (Fig. ID and fig. S1G), 
which were validated to be selectively essen- 
tial when albumin was an obligatory leucine 
source (fig. S2A). 

One prominent hit that was selectively essen- 
tial when cells fed on albumin was the trans- 
membrane protein TMEM251 (Fig. 1D and fig. 
S1G), which hereafter is referred to as lysosomal 
enzyme trafficking factor (LYSET). To validate 
this selective dependency, we generated LYSET 
inducible knockout (iKO) MIA PaCa-2 cells 
(fig. S2B), as well as single cell-derived LYSET 
knockout (KO) clones from parental MIA PaCa-2 
cells (Fig. IE). LYSET KO cells grew normally 
under standard culture conditions. However, 
LYSET KO cells failed to proliferate when placed 
in leucine-deficient, albumin-supplemented me- 
dium (Fig. IF) and were rapidly outcompeted 
in competitive proliferation assays (fig. S2C). 
We deleted LYSET in a range of human and 
murine cell types, including additional pan- 
creatic cancer cell lines; colon, lung, and bladder 
cancer cells; as well as nontransformed mouse 
embryonic fibroblasts (MEFs) and immortal- 
ized baby mouse kidney (iBMK) cells with and 
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Fig. 1. A genome-wide CRISPR screen identifies LYSET as selectively 
essential when cancer cells use extracellular proteins as nutrients. 

(A) Mammalian cells can acquire exogenous amino acids (AAs) through the 
import of monomeric amino acids or through endocytosis and lysosomal 
catabolism of extracellular proteins. (B) Experimental design of the CRISPR 
screen to identify regulators of cell proliferation supported by extracellular 
proteins as nutrients. (C) Population doublings of MIA PaCa-2 cells during the 
CRISPR screen. (D) Gene-level enrichment or depletion of sgRNAs in leucine- 
deficient + 4% albumin medium versus AA-rich medium in the CRISPR screen. 
Dashed lines indicate significance (P < 0.01) of genes that are essential [logs 
fold change (FC) < -2] or detrimental (logs FC > 1.5) when cells feed on 
extracellular proteins. Selected hits are highlighted. (E) Immunoblot (IB) of 
LYSET KO MIA PaCa-2 cells. Ctrl, control. (F) Proliferation of LYSET KO MIA 


without transformation by Hras©”Y to in- 


by feeding on albumin, which was potently sup- 


PaCa-2 cells in leucine-deficient + 4% albumin medium and in AA-rich medium. 
Data are represented as means + SDs (N = 3 replicates). (G) Fold change in cell 
number of LYSET KO cell lines after 3 days in AA-rich medium or 4 days in 
eucine-deficient + 3% albumin medium. Data are represented as means + SDs 
(N = 3). Dashed lines indicate starting cell numbers. (H and I) Growth of 
subcutaneous syngeneic tumors from LYSET iKO MC-38 cells in immunodeficient 
Rag2’~ mice (control, N = 6; LYSET iKO, N = 7) (H) and wild-type mice (N = 5) (I). 
Data are represented as means + SEMs. (J) Competitive growth of orthotopic 
pancreatic tumors from LYSET iKO KPC cells in syngeneic wild-type (wt) or Rag2-’~ 
mice. The LYSET iKO:control cell ratio after 14 days, quantified by flow cytometry, 
is represented as a mean + SEM (N = 6), and the dashed line indicates the 
initial ratio. For the final time point of tumor growth experiments, P values were 
calculated by unpaired two-sided t test with Welch correction. 


plantation model of pancreatic cancer (KPC). 


vestigate LYSET’s function more broadly. In 
line with CRISPR screens from the DepMap 
project, which characterize LYSET (TMEM251) 
as a widely nonessential gene (72), loss of LYSET 
had no effect on cell proliferation under stan- 
dard culture conditions in any of the examined 
cell lines (Fig. 1G and fig. S3A). By contrast, in 
leucine-deficient medium supplemented with 
3% albumin, loss of LYSET strongly suppressed 
proliferation and viability of all cancer cell lines, 
MEFs, and iBMK cells. Hras@”” transforma- 
tion, which promotes macropinocytic nutri- 
ent uptake (4), enabled iBMK cells to grow 
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pressed by genetic ablation of LYSET (Fig. 1G). 
To determine the relevance of LYSET for 
cancer cell growth in vivo, we examined the 
consequences of LYSET deficiency in several 
tumor models. Genetic ablation of LYSET im- 
paired tumor growth of human MIA PaCa-2 
and murine EPP2 pancreatic cancer cells upon 
subcutaneous transplantation into immuno- 
deficient mice (fig. S3, B and C). To probe 
these effects in the presence of a functional 
immune system, we examined protumorigenic 
functions of LYSET in a syngeneic colon cancer 
model (MC-38) and in an orthotopic trans- 
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LYSET-deficient MC-38 cells were strongly 
impaired in subcutaneous tumor growth (Fig. 
1, H and J), and LYSET-deficient KPC cells 
were rapidly outcompeted by control cells in 
orthotopic pancreatic tumors (Fig. 1J), both 
in syngeneic immunocompetent and immu- 
nodeficient recipient mice. Thus, LYSET is 
dispensable for cell viability and growth in 
nutrient-rich conditions but is selectively 
required when extracellular proteins are ob- 
ligatory nutrients, in amino acid-depleted 
conditions in vitro, and during tumor growth 
in vivo. 
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LYSET is required for lysosomal degradation 

of macropinocytic and autophagic cargoes 

To determine the subcellular process during 
which LYSET is required for the nutritional 
use of extracellular proteins, we explored pos- 
sible functions in macropinocytosis, endosomal 
trafficking, and lysosomal catabolism. Albumin 
uptake and endosomal cargo trafficking to ly- 
sosomes were not altered in the absence of 
LYSET (fig. S4, A and B). However, lysosomal 
catabolism of a fluorescently labeled bovine 
serum albumin that becomes dequenched upon 
degradation (DQ BSA) was strongly decreased 
(Fig. 2, A and B). Lysosomal catabolism is a point 
of convergence of two metabolic pathways— 
macropinocytosis and autophagy, which sup- 
ply extracellular and intracellular constituents, 
respectively (J, 3). Loss of LYSET led to strong 
accumulation of the autophagosomal pro- 
teins p62 and LC3-II, similar to pharmacolog- 
ical suppression of lysosomal catabolism by 
the vacuolar-type adenosine triphosphatase 
(V-ATPase) inhibitor bafilomycin Al (Fig. 2C). 
Lysosomes break down biological macromo- 
lecules through the action of various hydrolytic 
enzymes. In LYSET-deficient cells, enzymatic 
activities of the lysosomal proteases cathepsins 
B and L were strongly reduced, comparable 
to the effect of pharmacological protease in- 
hibition (Fig. 2D). Similarly, activities of the 
lysosomal glycosidases f-galactosidase and 
a-mannosidase were barely detectable. Thus, 
the failure to degrade macropinocytic and auto- 
phagic cargoes in the absence of LYSET is 
caused by a loss of lysosomal enzyme activity. 
Consequently, albumin-supported cell prolif- 
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eration was suppressed to a comparable extent 
by LYSET deletion and by lysosomal protease 
inhibition (fig. S4C). In vivo, unlike LYSET de- 
ficiency, genetic ablation of the essential auto- 
phagy component ATG5 did not significantly 
impair tumor growth of MC-38 cells (fig. $4, D 
and E). Thus, lysosomal catabolism is required 
for tumor formation in this model, whereas 
autophagic turnover of intracellular constitu- 
ents is largely dispensable. 


LYSET is a core component of the 
mannose-6-phosphate pathway for lysosomal 
enzyme trafficking 


To understand why LYSET was required for 
the catabolic activity of lysosomes, we deter- 
mined changes in the proteome of LYSET KO 
MIA PaCa-2 cells using liquid chromatography- 
mass spectrometry (LC-MS) and label-free quan- 
tification (LFQ). From ~2800 robustly quantified 
proteins, <6% were significantly changed in 
LYSET-deficient cells (adjusted P < 0.05). Auto- 
phagic cargo receptors were among the most 
enriched proteins, consistent with the block in 
autophagosomal protein turnover (Fig. 3A and 
data S3). We noted that LYSET-deficient cells 
displayed a global decrease in lysosomal en- 
zymes and other lysosomal luminal proteins. 
To more precisely determine changes in the lyso- 
somal proteome, we enriched iron nanoparticle- 
loaded lysosomes over magnetic columns. From 
49 quantified lysosomal luminal proteins, al- 
most all were strongly depleted in LYSET-deficient 
cells (Fig. 3B and data S3). By contrast, the abun- 
dance of integral and peripheral lysosomal mem- 
brane proteins was not altered in cell lysates (fig. 
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Fig. 2. LYSET is required for lysosomal catabolism of macropinocytic and autophagic cargoes. 

(A) Lysosomal DQ BSA degradation in LYSET KO cells, analyzed by microscopy. Scale bars, 20 wm. 

(B) Quantification of DQ BSA degradation of cells shown in (A). Data are represented as means + SDs 

(N = 15 fields of view with 210 cells). P values were calculated by unpaired two-sided t test with Welch correction. 
a.u., arbitrary units. (€) p62 and LC3 levels in LYSET KO cells + bafilomycin Al (100 nM) for 3 hours, analyzed 

by IB. (D) Enzymatic activity of cathepsin B, cathepsin L, B-galactosidase (B-Gal), and a-mannosidase (o-Man) in 
LYSET KO cells. Cathepsin B and L activities were also measured in control cells + protease inhibitors (Pl). Data are 
represented as means + SDs (N = 3). Experiments were performed in MIA PaCa-2 cells. 
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S5A) and lysosomal fractions (fig. S5B). This sug- 
gests that LYSET is dispensable for lysosomal 
biogenesis, per se, but is specifically required 
for lysosomal accumulation of catabolic enzymes 
and other luminal proteins. 

Most lysosomal enzymes are synthesized as 
immature proenzymes and subsequently ac- 
tivated through proteolytic processing when 
reaching the lysosome (13). LYSET-deficient 
MIA PaCa-2 cells did not display decreased 
mRNA levels of cathepsin B, cathepsin L, or 
B-hexosaminidase (fig. S5C) or decreased abun- 
dance of their proenzyme forms (Fig. 3C). By 
contrast, the mature enzymes were nearly un- 
detectable (Fig. 3C). Similarly, loss of LYSET 
led to a strong decrease in mature lysosomal 
enzymes in other nontransformed and cancer- 
derived human and murine cell types (Fig. 3D 
and fig. S5, D and E). To trace the fate of the 
lysosomal enzymes, we determined their levels 
in the extracellular space. Immature proforms 
of cathepsins B and L were substantially in- 
creased in supernatants from LYSET-deficient 
cells (Fig. 3E). Consistently, secretome analysis 
revealed that lysosomal luminal proteins were 
generally enriched in cellular supernatants (fig. 
S5F and data $3). Thus, without LYSET, most 
proteins normally destined for the lysosomal 
lumen were mistrafficked to the extracellu- 
lar space. 

Upon reaching the Golgi, most newly synthe- 
sized lysosomal enzymes receive mannose-6- 
phosphate (M6P) modifications on N-linked 
glycan chains through consecutive action of N- 
acetylglucosamine (GlcNAc)-1-phosphotransferase 
and M6P-uncovering enzyme (UCE), which serves 
as a lysosomal trafficking signal for M6P receptors 
(Fig. 3F) (3, 14). To examine the status of M6P 
modifications in LYSET-deficient cells, we used 
a single-chain antibody fragment that specif- 
ically detects M6P-containing proteins (15). Loss 
of LYSET caused a substantial decrease in cel- 
lular M6P-modified proteins (Fig. 3G and fig. 
S6A). When secretion of newly synthesized lyso- 
somal enzymes was induced by ammonium 
chloride, M6P-modified proteins were readily 
detected in supernatants from control cells 
but not LYSET-deficient cells (Fig. 3H and fig. 
S6A). This suggested that LYSET functions as 
a component of the M6P pathway. Consis- 
tently, GNPTAB, which encodes the a and B 
subunits of the GlcNAc-1-phosphotransferase 
complex (16, 17), scored in our CRISPR screen 
as selectively required for albumin-supported 
growth (Fig. 1D). 

To further investigate the relationship be- 
tween LYSET and GNPTAB, we compared the 
phenotypic consequences of deleting either 
gene. In multiple cell lines, genetic ablation 
of LYSET or GNPTAB caused a comparable, 
strong reduction in M6P modifications (Fig. 3, 
G and H) and mature lysosomal enzymes (Fig. 
3G and fig. S6, B and C) as well as increased 
secretion of immature proenzymes (fig. S6D). 
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by label-free MS (N = 5). (B) Changes in lysosomal luminal proteins in enriched 
lysosomal fractions of LYSET KO MIA PaCa-2 cells, quantified by label-free MS 
(N = 5). (C and D) Mature lysosomal enzymes in indicated LYSET KO cancer 
cell lines and IMR90 fibroblasts, analyzed by IB. (E) Secretion of immature 
cathepsins by LYSET KO MIA PaCa-2 cells, analyzed by IB. (F) Schematic of the 
M6P pathway. GlcNAc-PT, GIcNAc-1-phosphotransferase (the a and B subunits 


Secretome analysis from LYSET- and GNPTAB- 
deficient MEFs demonstrated indistinguishable 
hypersecretion of lysosomal enzymes (Fig. 31 
and data S4). Consistently, a deficiency in LYSET 
or GNPTAB had the same metabolic conse- 
quences: accumulation of p62 and LC3-II (fig. 
S6, B and C), loss of lysosomal catabolic ac- 
tivity (fig. S6, E to H), and suppression of cell 
proliferation with extracellular proteins as nu- 
trients (fig. S61). Besides GNPTAB, two other 
proteins participate in the generation of M6P 
groups: GNPTG, which encodes the y subunit of 
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GlcNAc-1-phosphotransferase, and UCE, which 
cleaves the terminal GlcNAc residue from GIcNAc- 
phospho-mannose groups (13, /4). Loss of GNPTG 
or UCE did not cause an overt reduction in ma- 
ture lysosomal enzymes (fig. S6C), lysosomal 
proteolysis (fig. S6H), or extracellular protein- 
supported cell proliferation (fig. S6I). This is 
consistent with the accessory function of GNPTG 
(18) and the ability of M6P receptors to recog- 
nize both products and substrates of UCE (19). 

Because the lysosomal enzyme trafficking 
pathway originates in the Golgi, we wondered 
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modifications in organelle fractions (G) and supernatants upon NH,Cl-induced 
secretion (H) in LYSET KO and GNPTAB KO MIA PaCa-2 cells, analyzed by IB. 
(I) Changes in luminal lysosomal proteins and constitutively (const.) secreted 
proteins in the secretome of LYSET KO and GNPTAB KO MEFs, quantified in cellular 
supernatants by label-free MS (N = 5). Dark circles indicate adjusted P value < 
0.05. In (C) to (E) and (H), an asterisk denotes immature proenzymes. 


whether LYSET generally affected Golgi func- 
tion. LYSET-deficient cells did not display ap- 
parent changes in Golgi morphology (fig. $7, A 
and B) or the Golgi proteome (Fig. 3A). LYSET- 
deficient cells were also not impaired in con- 
stitutive protein secretion (Fig. 31) or lysosomal 
transport of lysosomal acid glucosylcerami- 
dase (GBA), which uses an M6P-independent 
trafficking pathway (Fig. 3B and fig. S5F). 
Thus, LYSET is an essential and specific com- 
ponent of the M6P pathway for lysosomal en- 
zyme trafficking. 
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LYSET deficiency causes lysosomal storage 
disorder phenotypes 
Hereditary mutations in GNPTAB that lead to a 
loss or reduction in GlcNAc-1-phosphotransferase 
activity cause mucolipidosis type I and III, 
respectively—autosomal recessive lysosomal 
storage disorders characterized by varying 
degrees of skeletal abnormalities, cardiomegaly, 
and developmental delay (20). Because LYSET 
KO cells were deficient in M6P modifications, 
we wondered whether they displayed similar 
lysosomal storage disorder phenotypes. Electron 
microscopy of LYSET-deficient MEFs and MIA 
PaCa-2 cells revealed that lysosomes often con- 
tained electron-dense material in their lumen, 
indicative of impaired digestion of lysosomal 
contents (Fig. 4A). Moreover, lysosomes were 
often enlarged, more irregularly sized, and accu- 
mulated to high numbers in the cytosol (Fig. 4, 
B to D, and fig. S7C). The accumulation of lyso- 
somes in LYSET-deficient cells was also appar- 
ent by the strong increase in lysotracker-positive 
organelles (fig. S7, D to F). These lysosomal 
defects resembled the phenotypes of GNPTAB- 
deficient cells (Fig. 4, A to D, and fig. $7, C to F) 
as well as the phenotypes of fibroblasts from 
mucolipidosis II and III patients (20, 27). 
Recently, two LYSET variants, R45W and Y72X, 
have been linked to familial skeletal dysplasia 
syndromes reminiscent of mucolipidosis I 
and III (22). To determine the functional effect 
of these disease-associated LYSET mutations, 
we stably expressed LYSET®*°, LYSET’”*, or 
wild-type LYSET in HEK 293T and MIA PaCa-2 
cells in which endogenous LYSET was deleted. 
In contrast to wild-type LYSET, which fully re- 
stored mature cathepsin levels and suppressed 
LC3-II accumulation (fig. S8A), LYSET®*®™ and 
LYSET’* failed to rescue cathepsin and LC3-II 
levels (Fig. 4E and fig. S8, B and C). Consistently, 
LYSET®““- and LYSET’”*-expressing cells 
displayed decreased lysosomal enzyme activities 
(Fig. 4F) and accumulated lysotracker-positive 
organelles (Fig. 4, G and H). LYSET®®” and 
LYSET’”* were barely detectable at the protein 
level, conceivably because they abrogated LYSET 
function by destabilization and premature trun- 
cation of the protein, respectively (Fig. 4E and 
fig. S8, B and C). Phenotypically, this resembled 
disease-associated mutations in GNPTAB that 
destabilize the protein (fig. S8, C and D). Thus, 
loss-of-function mutations in LYSET lead to 
lysosomal storage disorder phenotypes at the 
cellular level that are characteristic of muco- 
lipidosis II and III. 


Retention of GicNAc-1-phosphotransferase in Golgi 
membranes depends on association with LYSET 


To understand the molecular function of LYSET 
in the generation of M6P groups, we investi- 
gated possible interactions with GNPTAB. 
LYSET was present in the Golgi (fig. S9A), where 
it colocalized with epitope-tagged GNPTAB 
(Fig. 5A). Proximity ligation yielded a strong 
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signal between GNPTAB and LYSET (Fig. 5, B 
and C) but not for either of the two proteins 
with other ubiquitous Golgi proteins (fig. S9B). 
AlphaFold modeling predicted a high-confidence 
interaction between LYSET and GNPTAB (fig. 
S9, C to F), which was validated by efficient, re- 


ciprocal coimmunoprecipitation (co-IP) of tran- 
siently overexpressed LYSET and GNPTAB 
(Fig. 5, D and E) as well as endogenous LYSET 
and ectopically expressed GNPTAB (fig. S9, G 
and H). Stable isotope labeling with amino acids 
in cell culture (SILAC)-co-IP proteomics of the 
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Fig. 4. LYSET deficiency causes lysosomal storage disorder phenotypes. (A and B) Electron micrographs 
showing undigested lysosomal luminal contents (A) and accumulation of lysosomes (B) in LYSET iKO and 
GNPTAB iKO MEFs. Scale bars, 1 um. (€ and D) Quantification of lysosomal area (C) and lysosomal number (D) of 
electron micrographs shown in (B). Data are represented as means + SDs (control: 13 cells, 235 lysosomes; 
LYSET KO, GNPTAB KO: 18 cells, >900 lysosomes). (E to H) Lysosomal phenotypes in LYSET KO HEK 293T cells 
expressing LYSET wild type or the pathogenic variants R45W or Y72X. (E) LC3 and lysosomal enzyme levels, 
analyzed by IB. (F) Lysosomal enzyme activities relative to control cells; data are represented as means + SDs 
(N = 3). (G) Lysotracker accumulation, quantified by flow cytometry; data are represented as replicate 

means + SEMs (N = 4). (H) Lysotracker accumulation, analyzed by microscopy. Scale bars, 10 um. P values 
were calculated by unpaired two-sided t test with Welch correction. 
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endogenous proteins revealed that LYSET spe- 
cifically interacted with GNPTAB, GNPTG, and 
several GlcNAc-1-phosphotransferase substrates 
(Fig. 5F and data S5). These results establish 
LYSET as a component of the Golgi-resident 
GlcNAc-1-phosphotransferase complex. 

Next, we examined a potential role of LYSET 
in controlling the abundance of mature GNPTAB. 
Upon reaching the Golgi, the GNPTAB a/f pre- 
cursor is cleaved into o and B subunits by site-1 
protease (SIP) (23). We monitored mature 
GNPTAB by using an antibody that recognizes 
the endogenous a subunit and by ectopically 
expressing GNPTAB-myc, which generates an 
epitope-tagged 8 subunit. LYSET-deficient cells 
did not display a decrease in GNPTAB mRNA 
(fig. S91) or the a/8 precursor protein (Fig. 5G). 
However, LYSET-deficient cells displayed an 
almost complete loss of the mature GNPTAB oa 
and B subunits (Fig. 5, G and H, and fig. S9J). B 
subunit levels were also decreased in LYSET®“Y 
cells, consistent with the poor expression of 
this mutant LYSET variant (fig. S9K). How- 
ever, LYSET®®” and GNPTAB still coimmuno- 


precipitated, suggesting that LYSET®™ retains 
the interaction with GNPTAB (fig. S9, K and L). 
Conversely, GNPTAB-deficient cells did not 
display a decrease in LYSET expression (fig. 
S6, B and C) or Golgi localization (fig. S9M). 
Because these findings suggested that LYSET 
controls the abundance of GNPTAB at the 
posttranslational level, we examined GNPTAB 
processing. Radioactive pulse-chase experiments 
demonstrated that LYSET was dispensable for 
synthesis and proteolytic processing of ec- 
topically expressed GNPTAB a/f precursor 
(fig. SIOA). ATF6, another S1P substrate, was 
also cleaved normally (fig. S10B). Moreover, en- 
doglycosidase H; (Endo H,) and N-glycosidase 
F (PNGase F) treatments indicated that high- 
mannose and complex-type glycan modifica- 
tions of mature GNPTAB were not perturbed 
in the absence of LYSET (fig. S10C). 

Because GNPTAB processing appeared to be 
intact in LYSET-deficient cells, we reasoned 
that LYSET was required for stabilization of the 
mature protein. To test this, we sought to block 


GNPTAB degradation by inhibiting lysosomal 


proteases or the proteasome—two pathways 
that degrade Golgi-derived proteins (24). In 
LYSET wild-type cells, bafilomycin Al and ly- 
sosomal protease inhibitors led to a slight in- 
crease in the GNPTAB £ subunit (fig. SOD). In 
LYSET-deficient cells, bafilomycin A1 stabilized 
the GNPTAB a and § subunits (Fig. 5, H and I) 
and restored GNPTAB levels in the Golgi (fig. 
S10E), to a similar degree as previously observed 
in cells expressing Golgi retention-deficient 
GNPTAB mutants (25). S1P inhibition abro- 
gated accumulation of the GNPTAB a and B 
subunits in LYSET-deficient cells treated with 
bafilomycin Al, corroborating that cleavage of 
the a/8 precursor by SIP occurs in this context 
(Fig. 5, H and I). Lysosomal protease inhibitors 
also stabilized the a and £ subunits (Fig. 5, J 
and K), whereas proteasome inhibition did not 
have any effect (fig. S10, F and G). Thus, in the 
absence of LYSET, GNPTAB is lost by traffick- 
ing to lysosomes and subsequent degradation 
through residual protease activity. 

To understand how LYSET facilitates prop- 
er localization of GNPTAB, we examined the 
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Fig. 5. GicNAc-1-phosphotransferase is retained in the Golgi apparatus 
through association with LYSET. (A) Colocalization of LYSET and GNPTAB- 
Flag in MIA PaCa-2 cells, analyzed by immunofluorescence (IF). Scale bars, 

10 um. DAPI, 4',6-diamidino-2-phenylindole. (B) Proximity ligation assay (PLA) of 
LYSET and GNPTAB-myc in MIA PaCa-2 cells. Scale bars, 20 um. (€) Quantification 
of PLA signal shown in (B). Data are represented as means + SDs (N = 10 fields of view 
with 213 cells). P values were calculated by unpaired two-sided t test with Welch 
correction. (D and E) Reciprocal co-IP of transiently expressed LYSET and 
GNPTAB-myc in HEK 293T cells, analyzed by IB. (F) Endogenous interaction 
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IA PaCa-2 cells, analyzed by SILAC-co-IP MS (N = 4). 

(G) GNPTAB-myc levels in LYSET KO MIA PaCa-2 cells, analyzed by IB. (H and I) 
Levels of GNPTAB o subunit in SKMEL-30 cells (H) and GNPTAB-myc in MIA 
PaCa-2 cells (1) deficient for LYSET + BafA1 (100 nM) and the S1P inhibitor 
PF-429242 (5 uM) for 16 hours, analyzed by IB. (J and K) Levels of GNPTAB o 
subunit in SKMEL-30 cells (J) and GNPTAB-myc in MIA PaCa-2 cells (K) deficient 
+ bafilomycin Al (100 nM) or lysosomal protease inhibitors for 16 hours, 
analyzed by IB. a/B-myc and B-myc denote the myc-tagged GNPTAB a/B 

and B subunit, respectively. GNPT-o denotes the a subunit. 
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transmembrane domain of GNPTAB, which 
consists of one N-terminal helix in the o. subunit 
(TM1) and one C-terminal helix in the 8 subunit 
(TM2). TM1 of human GNPTAB contains mul- 
tiple charged or hydrophilic residues, whose 
biophysical properties destabilize helices in a 
hydrophobic membrane environment (Fig. 6, 
A and B). By contrast, TM1 is a regular hydro- 
phobic transmembrane helix in the GNPTAB 
homolog of Drosophila, which does not have 
LYSET. Notably, the unfavorable transmem- 
brane domain of GNPTAB co-occurs evolution- 
arily with LYSET: Although GNPTAB homologs 
in animals that lack LYSET contain hydropho- 
bic transmembrane helices, GNPTAB homologs 
in animals that have LYSET are predicted to 
insert poorly into membranes (Fig. 6C and 
data S6). At the sequence level, the presence of 
LYSET coincides with a conserved charged and 
hydrophilic patch in TM1 of GNPTAB (Fig. 6D). 
To test the functional relevance of this feature, 
we substituted two conserved charged or 
hydrophilic amino acid residues in TM1 of 
human GNPTAB (Q36 and E39; Fig. 6A) with 
leucine and investigated localization and func- 
tion of the mutant protein in LYSET-deficient, 
wild-type, and LYSET-overexpressing cells. As 
expected, wild-type GNPTAB was undetectable 
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in LYSET-deficient cells, whereas overexpres- 
sion of LYSET stabilized GNPTAB at elevated 
levels (Fig. GE). By contrast, GNPTABO°0 E39 
was highly expressed and localized to the Golgi 
independently of LYSET (Fig. 6, E and F, and 
fig. SIOH). Consequently, GNPTAB?**>™"" fully 
rescued lysosomal protein degradation and 
proliferation of LYSET-deficient cells that use 
extracellular proteins as nutrients (Fig. 6, G and 
H). Thus, replacing the unusual charged and 
hydrophilic motif of TM1 with hydrophobic 
residues restores GNPTAB function in the ab- 
sence of LYSET. 


Discussion 


The above results identify LYSET as a core com- 
ponent of the M6P pathway that is required 
for lysosomal enzyme trafficking and, conse- 
quently, for lysosomal nutrient generation (fig. 
S11). LYSET is a small transmembrane protein 
of 131 amino acid residues (conserved isoform) 
that lacks discernible homology to other pro- 
teins. A clue for the molecular function of LYSET 
came from the unusual transmembrane do- 
main of its interaction partner GNPTAB, which 
contains multiple charged and hydrophilic amino 
acid residues that energetically disfavor mem- 
brane integration. Structural modeling and 


biochemical analyses indicate that associ- 
ation of LYSET with the GNPTAB oa and B 
subunits provides the free energy to forma 
stable transmembrane domain in the GlcNAc- 
1-phosphotransferase complex. LYSET is highly 
conserved in vertebrates, and clear homologs 
are present in several other metazoan phyla. 
The co-occurrence of LYSET with GNPTAB 
homologs that have an energetically disfavored 
transmembrane domain suggests that they be- 
came functionally linked during the evolution 
of the M6P pathway. Conceivably, controlling 
GlcNAc-1-phosphotransferase levels through 
LYSET constitutes a mechanism to regulate 
the lysosomal enzyme trafficking pathway. 
A deficiency in either LYSET or GNPTAB 
caused indistinguishable cellular phenotypes— 
accumulation of lysosomes that are depleted 
of catabolic enzymes, fail to degrade macro- 
pinocytic and autophagic cargoes, and contain 
undigested contents. Similar phenotypes were 
observed for mutant variants of LYSET that 
are associated with skeletal dysplasia syndromes, 
which resemble the lysosomal storage disor- 
ders mucolipidosis IT and III (22). The present 
study provides a mechanistic explanation for 
disease-associated LYSET variants and sug- 
gests that loss-of-function mutations in LYSET 
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Fig. 6. An unfavorable transmembrane domain renders GicNAc-1-phospho- 
transferase dependent on LYSET. (A and B) AlphaFold prediction of trans- 
membrane helix 1 (TM1) of human and fruit fly GNPTAB. In (A), negative charge is 
indicated in red and positive charge in blue; in (B), hydrophobic residues are 
indicated in ochre and hydrophilic residues in turquoise. (€) Prediction of the 
apparent free energy (AGapp) for membrane insertion of GNPTAB transmembrane 
helices 1 and 2 (TM1 and TM2) in animals that have or lack LYSET homologs. Box 
plots show Sth to 95th percentiles. A negative AG,,, corresponds to favorable 
membrane insertion. (D) Sequence comparison of GNPTAB TM1 from animals that 
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have or lack LYSET homologs. (E) GNPTAB wild type and Q36L,E39L levels in LYSET 
KO, wild-type, and LYSET-overexpressing (OE) cells, analyzed by IB. «/B-myc and 
B-myc denote the myc-tagged GNPTAB a/B precursor and 8 subunit, respectively. 
(F) GNPTAB wild type and Q36L,E39L localization in LYSET KO cells, analyzed by IF. Scale 
bars, 10 um. (G) DQ BSA degradation in LYSET-deficient cells expressing GNPTAB wild 
type or Q36L,E39L (LL), quantified by flow cytometry. Data are represented as replicate 
means + SEMs (N = 3). (H) Proliferation of LYSET-deficient cells expressing GNPTAB 
wild type or Q36L,E39L in leucine-deficient + 4% albumin medium. Data are represented 
as means + SDs (N = 3). Experiments were performed in MIA PaCa-2 cells. 
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could underlie mucolipidosis II and III cases 
that lack mutations in other components of 
the GlcNAc-1-phosphotransferase complex. Two 
recent CRISPR screens identified LYSET as 
required for viral infections (26) and turnover 
of lysosomal membrane proteins (27). These 
findings further highlight the importance of 
LYSET in diverse physiological and pathologi- 
cal processes where lysosomal catabolism plays 
a critical role. 

Our findings reveal an essential function of 
LYSET and the M6P pathway for cells that de- 
pend on lysosomal catabolism to generate nu- 
trients. One context in which this metabolic 
state arises is in solid tumors, where limited or 
dysfunctional vasculature creates nutrient- 
deficient regions. To cope with such austere 
environments, cancer cells commonly use ly- 
sosomal protein catabolism to acquire amino 
acids (7-3). Several oncogenic signaling path- 
ways promote cellular processes that supply 
macromolecular substrates to lysosomal catab- 
olism, thereby enhancing the metabolic flexibil- 
ity and resilience of cancer cells. In particular, 
oncogenic Ras signaling induces macropino- 
cytosis of extracellular proteins as exogenous 
nutrients (4) and elevates basal autophagy for 
recycling of intracellular constituents (28). 
Genetic ablation of LYSET potently suppresses 
lysosomal degradation of macropinocytic and 
autophagic cargoes but does not impair cell 
viability and growth under nutrient-replete 
conditions. Thus, strategies to inhibit LYSET 
and the lysosomal enzyme trafficking path- 
way may provide a promising entry point to 
therapeutically exploit metabolic adaptations 
in cancer. 


Methods summary 


Cell lines used in most experiments were MIA 
PaCa-2, HEK 293T, and MEFs. Cell culture ex- 
periments in different nutrient conditions were 
performed with amino acid-free, glucose-free 
Dulbecco’s modified Eagle’s medium: nutrient 
mixture F-12 (DMEM/F-12) reconstituted with 
glucose and all amino acids except leucine at 
standard DMEM/F-12 concentrations. For MIA 
PaCa-2 cells, leucine-free medium was supple- 
mented with 4% albumin, 10% fetal bovine serum 
(FBS), and 5 uM leucine, which supports con- 
tinuous proliferation. For other cell lines, leucine- 
free medium was supplemented with 3% albumin 
and 10% dialyzed FBS. In proliferation assays, 
cell numbers were determined using a CASY 
cell counter or a Guava easyCyte flow cyto- 
meter. In competitive proliferation assays, 
sgRNA-positive cells were cocultured with non- 
sgRNA-expressing controls, and percentages 
of sgRNA-positive cells were quantified using 
a Guava easyCyte flow cytometer. 

A MIA PaCa-2 iCas9 clone transduced with 
the genome-wide Vienna sgRNA library has 
been described previously (17). Cas9 expres- 
sion was induced with doxycycline for 3 days 
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before the screen. On day 0, cells were plated 
in four media conditions that were either rich 
or deficient in monomeric leucine and/or al- 
bumin as two distinct leucine sources [amino 
acid (AA)-rich + 4% albumin, Leu-deficient + 
4% albumin]. Samples were collected on day 0 
and after ~9 and 14 population doublings or 
after an equivalent period under selection in 
the different nutrient conditions. Cells corre- 
sponding to >600-fold library representation 
were collected, and next-generation sequenc- 
ing libraries were prepared, pooled, and se- 
quenced on an Illumina HiSeq 2500 platform. 
Depletion and enrichment of sgRNAs were 
calculated as described previously (77) using 
the MAGeCK algorithm (29) for each condition 
and time point compared with day 0. Time 
points of the same condition were merged, rep- 
resenting for each gene the maximal effect (more 
significant P value). For the conditions AA-rich 
and Leu-deficient + 4% albumin, changes in 
sgRNA abundance were also calculated by di- 
rect comparison of the end points. 

For immunoblotting, endogenous and ec- 
topically expressed GNPTAB were analyzed in 
organelle-enriched fractions. Cells were ho- 
mogenized with a Dounce tissue grinder. From 
postnuclear supernatants (PNS), organelle frac- 
tions were prepared by centrifugation at 18,000 g 
for 30 min or at 100,000 g for 60 min. Organelles 
were resuspended in lysis buffer and subjected 
to immunoblotting. For analysis of GNPTAB 
glycosylation, equal protein amounts of PNS 
were treated with the glycosidases Endo Hy 
or PNGase F for 3 hours. M6P-modified pro- 
teins were analyzed in organelle fractions and in 
supernatants upon induced secretion with 
10 mM NH,Cl using an M6P-scFv antibody (15). 

Lysosomal DQ BSA fluorescence dequench- 
ing and lysotracker accumulation were quan- 
tified by live imaging or flow cytometry. Cells 
were incubated with DQ BSA for 6 hours and 
with lysotracker for the final 1 hour. For live im- 
aging, Hoechst was also added for the final hour. 
Activities of a-mannosidase and f-galactosidase 
were assayed with the substrates 4-nitrophenyl- 
a-D-mannopyranoside and 4-nitrophenyl-f-p- 
galactopyranoside, respectively; absorbance 
of liberated p-nitrophenol was measured at 
405 nm. Activities of cathepsins B and L were 
measured with the substrates Z-Arg-Arg-7-amido- 
4-methylcoumarin and Z-Phe-Arg-7-amido-4- 
methylcoumarin, respectively; fluorescence of 
liberated '7-amido-4-methylcoumarin was mea- 
sured with excitation at 365 nm and emission 
at 440 nm. 

For lysosomal fractionation, lysosomes were 
loaded with ferromagnetic nanoparticles 
(DexoMAG C) and enriched over LS MACS 
columns. For secretome analysis, cells were 
cultured in OptiMEM for 24 hours, and me- 
dia were collected and cleared by centrifu- 
gation. For SILAC-based co-IPs, LYSET wild-type 
and KO cells were labeled with medium or heavy 
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arginine and lysine isotopes for three pas- 
sages; isotope labels were swapped between 
genotypes in different experimental replicates 
to avoid isotope bias. Cell lysates were pooled 
and immunoprecipitated using an antibody 
that recognizes endogenous LYSET. Sample 
preparation for proteomics was performed 
through tryptic in-gel digestion. LC-MS analysis 
was performed on an Ultimate 3000 ultraper- 
formance liquid chromatography (UPLC) sys- 
tem connected to an Orbitrap Exploris 480 mass 
spectrometer using data-dependent acquisition 
(DDA) mode. Data analysis of LFQ and SILAC 
experiments was performed with MaxQuant 
(30) using an organism-specific database ex- 
tracted from UniProt under default settings. 
The statistical analysis for LFQ values and SILAC 
ratios was performed with the R-package limma 
(31). P values were adjusted with the Benjamini- 
Hochberg method for multiple testing. 

For structural modeling, structures of human 
and Drosophila GNPTAB were predicted five 
times using a local version of ColabFold (32); 
transmembrane helix 1 of the model with the 
best predicted local distance difference test 
(pLDDT) scores was used. To obtain a model 
of the transmembrane region of the complex 
between human LYSET and GNPTAB, the 
analysis was restricted to the immediate mem- 
brane region, which yielded high pLDDT and 
predicted aligned error (PAE) scores, indicating 
reliable models. For AG,», prediction, GNPTAB 
orthologs were identified using National Center 
for Biotechnology Information (NCBI) BLAST 
in the UniProt reference proteomes database 
or the NCBI nonredundant protein database. 
LYSET or TMEM251 orthologs were collected 
in a search using the Pfam TMEM251 hidden 
Markov model in the NCBI nonredundant 
protein database (33). Species were grouped 
by the presence or absence of a detectable 
TMEM251 ortholog. GNPTAB transmembrane 
helices 1 and 2 were predicted with TMHMM 
(34). The apparent free energy difference, AGapp, 
for insertion of the transmembrane sequence 
into the endoplasmic reticulum membrane was 
predicted with AG predictor (35). 

For subcutaneous tumor-growth experiments, 
suspensions of LYSET KO and control cells 
(EPP2, MC-38, or MIA PaCa-2) were injected 
into the shaved flanks of anesthetized mice. 
Tumor size was monitored by caliper measure- 
ments two to three times a week. Necropsies 
were taken when termination criteria were 
reached as defined by the animal protocol. For 
orthotopic competition assays, LYSET KO and 
control KPC cells were pooled in a ratio of 
12.5:1 and injected into the pancreas of anes- 
thetized mice. Animals were euthanized 14 days 
after orthotopic injection, and the ratio of 
LYSET KO to control cells was quantified by 
fluorescence-activated cell sorting (FACS). A 
full description of the materials and methods, 
including primer and sgRNA sequences (tables 
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S1 to S3), is provided in the supplementary 
materials. 
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INTRODUCTION: Lysosomes are key degrada- 
tive compartments within the cell that are 
essential to maintain protein homeostasis. 
Their dysfunction causes over 70 rare genetic 
diseases collectively known as lysosomal stor- 
age disorders (LSDs). Intracellular sorting of 
most soluble lysosomal enzymes occurs by 
tagging with mannose 6-phosphate (M6P) 
residues in the Golgi apparatus, which are 
recognized by specific receptors that direct 
transport to the endosomal/lysosomal system. 
GIcNAc-1-phosphotransferase catalyzes the first 
step in M6P-tagging. Inherited loss of GlcNAc-1- 
phosphotransferase function causes the severe 


& 


LSD mucolipidosis type II (MLII). The M6P 
signal-mediated lysosomal sorting pathway 
is well studied and thought to be completely 
understood. However, it remains unknown 
whether additional critical components exist. 


RATIONALE: Certain viruses program success- 
ful entry into cells by co-opting lysosomal 
cathepsin proteases to cleave and activate viral 
structural proteins allowing delivery of their 
genome into the cytoplasm. This infection 
strategy is shared among different virus 
families including reovirus, Ebola virus, and 
severe acute respiratory syndrome corona- 
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LYSET is an essential component of the M6P lysosomal transport pathway. By using genome-scale 
genetic screens for viral infection we identified LYSET as a protein required for lysosome biogenesis. LYSET 
controls GlcNAc-1-phosphotransferase (GlcNAc-1-PT) function by binding to and retaining it in the Golgi 
apparatus. In LYSET KO cells M6P tagging is severely disrupted resulting in strong resistance to infection by 
certain viruses, aberrant secretion of enzymes normally present in the lysosome, and abnormally large 
lysosomes filled with storage material as a result of drastically reduced levels of active hydrolytic enzymes. 
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virus 2 (SARS-CoV-2). Therefore, these viruses 
are sensitive probes for lysosomal function. To 
identify genes important for lysosomal homeosta- 
sis, we performed genome-scale CRISPR screens 
using susceptibility to reovirus infection as 
phenotypic selection. 


RESULTS: The genetic screens identified 
TMEM251—a small, uncharacterized protein— 
as an essential component of lysosomal bio- 
genesis. Cells with knockout (KO) mutations 
in TMEM251 were refractory to infection by 
reovirus, SARS-CoV-2, and vesicular stoma- 
titis virus pseudotyped with the Ebola virus 
glycoprotein. This was due to strongly reduced 
activity of lysosomal cathepsin proteases. More- 
over, quantitative proteomics revealed a severe 
global sorting defect in which intracellular 
enzymes destined for lysosomal delivery were 
instead secreted into the medium. Thus, we 
renamed TMEM251 to LYSET (for lysosomal 
enzyme trafficking factor). Mechanistically, 
we showed that LYSET binds to GlcNAc-1- 
phosphotransferase in the Golgi apparatus and 
controls its stability. LYSET KO caused aber- 
rant localization of GlcNAc-1-phosphotransferase 
to lysosomes and subsequent degradation, re- 
sulting in M6P tagging failure. LYSET KO mice 
displayed typical diagnostic features of MLIT 
including elevated levels of lysosomal enzymes 
in blood serum and enlarged lysosomes with 
accumulation of electron dense material in 
isolated cells. Recently, an MLII-like genetic 
disorder in patients carrying biallelic muta- 
tions in LYSET has been described. We showed 
that complementation of LYSET KO cells with 
these pathogenic mutants failed to rescue lyso- 
somal sorting defects. 


CONCLUSION: Our work identifies LYSET as 
an indispensable component of the biosyn- 
thetic pathway that directs transport of sol- 
uble enzymes to the lysosome. As such, LYSET 
is essential for entry of diverse, highly patho- 
genic viruses that rely on endo-lysosomal 
activation by cathepsins. We uncovered an 
unexpected molecular mechanism in which 
LYSET regulates GlcNAc-1-phosphotransferase 
function by binding to and retaining it in 
the Golgi apparatus. The key role of LYSET 
in lysosome biogenesis likely explains MLII- 
like symptoms observed in patients with 
LYSET mutations. Altogether, our findings 
provide insights in fundamental cell physi- 
ology with relevance for human inherited 
disease and viral infection. 
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Lysosomes are key degradative compartments of the cell. Transport to lysosomes relies on GIcNAc-1- 
phosphotransferase—mediated tagging of soluble enzymes with mannose 6-phosphate (M6P). GIcNAc-1- 
phosphotransferase deficiency leads to the severe lysosomal storage disorder mucolipidosis II (MLII). 
Several viruses require lysosomal cathepsins to cleave structural proteins and thus depend on functional 
GlcNAc-1-phosphotransferase. We used genome-scale CRISPR screens to identify lysosomal enzyme 
trafficking factor (LYSET, also named TMEM251) as essential for infection by cathepsin-dependent viruses 
including severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). LYSET deficiency resulted in global 
loss of M6P tagging and mislocalization of GicNAc-1-phosphotransferase from the Golgi complex to lysosomes. 
Lyset knockout mice exhibited MLII-like phenotypes, and human pathogenic LYSET alleles failed to restore 
lysosomal sorting defects. Thus, LYSET is required for correct functioning of the M6P trafficking machinery and 
mutations in LYSET can explain the phenotype of the associated disorder. 


iruses have evolved to hijack the cellular 
endocytic machinery to enter the cell (7). 
Lysosomal cathepsin proteases mediate 
the stepwise proteolytic disassembly of 
reovirus particles, which is essential 
for infection (2). Cathepsins also cleave viral 
glycoproteins of several enveloped viruses 
during viral entry, triggering productive infec- 
tion. This includes highly pathogenic viruses 
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including the filovirus family member Ebola 
virus (EBOV) and coronaviruses including 
severe acute respiratory syndrome corona- 
virus (SARS-CoV), SARS-CoV-2, and Middle 
East respiratory syndrome (MERS) corona- 
viruses (3-6). These viruses are thus sensitive 
probes of lysosomal function. 


LYSET is a cellular factor essential for infection 
by diverse viruses including SARS-CoV-2 


To identify genes critical for reovirus infection, 
we performed a genome-scale CRISPR-Cas9 
screen in human glioblastoma cells (U87MG). 
In these cells, productive viral infection leads 
to cell death. After CRISPR-Cas9 mutagenesis 
with the Brunello library (7), which covers 
19,114 genes, cells were infected with reovirus 
type 3D (ReoT3D). To identify protective gene 
mutations, we retrieved the guide RNA (gRNA) 
sequences present in the resistant population 
and compared them with the unselected popu- 
lation (fig. S1A). In line with its essential role in 
reovirus entry, the gene encoding cathepsin L 
was highly enriched in the resistant popula- 
tion (Fig. 1A and table S1) (8). Consistent with 
the importance of mannose 6-phosphate (M6P)- 
dependent lysosomal transport for cathepsin 
activity (9), genes encoding the a/B (GNPTAB) 
(0) and y (GNPTG) (11) subunits of GlcNAc-1- 
phosphotransferase scored highly, as did site-1 
protease (SIP), a Golgi-localized protein required 
for the activation of GIcNAc-1-phosphotransferase 
(12). We did not identify M6P receptors (MPR), 
likely owing to functional overlap between the 
two M6P receptor types, cation-dependent and 
cation-independent MPR (73). The second high- 
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est hit was TMEM251, a gene encoding a largely 
uncharacterized transmembrane protein. Based 
on the results outlined below, we renamed 
this gene lysosomal enzyme trafficking factor 
(LYSET). A separate genome-scale CRISPR 
screen in eHAP cells also identified deletion 
of GNPTAB and LYSET as highly protective 
against reovirus infection, corroborating the 
essentiality of M6P-mediated lysosomal pro- 
tein sorting and LYSET for reovirus infection 
(fig. S1B and table S2). To validate and charac- 
terize the role of LYSET, GNPTAB, and S1P in 
viral infection, we generated clonal knockout 
(KO) cell lines in U87MG and 293FT cells using 
CRISPR-Cas9 (fig. S2). Knockout of LYSET, 
GNPTAB, and SIP resulted in strong protection 
against cell death following reovirus infection in 
both cell types (Fig. 1B). Intracellular viral RNA 
levels and virus production were severely re- 
duced, suggesting an early block in viral entry or 
replication (Fig. 1, C and D). Because GNPTAB, 
GNPTG, and S1P have known roles in the sort- 
ing of lysosomal cathepsins, we reasoned that 
LYSET might act similarly. We first tested 
whether LYSET deficiency would prevent viral 
entry of other cathepsin-dependent viruses 
(3). As a faithful model of EBOV entry (J4) we 
used a GFP-encoding vesicular stomatitis virus 
pseudotyped with the EBOV glycoprotein (VSV- 
EBOV-GP). Whereas parental 293FT cells were 
susceptible to infection with VSV-EBOV-GP 
as evidenced by a time-dependent increase in 
GFP fluorescence, LYSET KO cells were highly 
refractory to infection (Fig. 1E, fig. S3A, and 
movies S1 and S2). Similar results were ob- 
tained using clonal HAP1 LYSET KO cells and 
pooled knockouts in human skin fibroblasts 
(fig. S3, B to D, and movies S3 to S6). SARS- 
CoV-2 requires activation of its spike protein 
during viral entry, which can be mediated by 
active endosomal or lysosomal cathepsins 
or by the transmembrane serine protease 
TMPRSS2 (5). In cells with very low TMPRSS2 
expression, cathepsins become essential for 
SARS-CoV-2 entry. Whereas parental A549- 
ACE2 cells were highly susceptible to infection 
by VSV pseudotyped with the SARS-CoV-2 
spike protein, deletion of LYSET strongly re- 
duced infection levels during the time course of 
infection (fig. S3, E and F, and movies $7 and 
S8). To validate these results in the context of 
the authentic virus, we used an infectious mo- 
lecular clone of SARS-CoV-2 that expresses 
the mNeonGreen fluorescent protein (75). Con- 
sistent with the results of the pseudotyped 
viruses, we observed a substantial decrease in 
infection (Fig. 1F and fig. S3G). In cells expres- 
sing TMPRSS2, most SARS-CoV-2 variants pref- 
erentially use this route of entry (16). However, 
the recently emerging omicron variant is less 
efficiently cleaved by TMPRSS2 and more de- 
pendent on cathepsin-mediated entry than 
other variants of concern such as the delta 
variant (16, 17). We thus tested lentiviral particles 
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Fig. 1. LYSET is a critical host factor for multiple viruses that use acti- 
vated cathepsins for entry. (A) CRISPR screen for reovirus T3D (ReoT3D) host 
factors in U87MG cells. Significance of enrichment was determined through 
AGeCK analysis (y axis). All genes are plotted as bubbles on the x axis, each 
epresenting a specific gene. Each subset is color-coded according to function 
and labels show gene names. (B) Crystal violet staining after infection with 
ReoT3D representative of n = 3 biologically independent replicates. mock, 
noninfected controls. (©) RT-qPCR quantification of ReoT3D RNA in infected 
U87MG and 293FT cells at multiplicity of infection (MOI) 1 at 72 hours post 
infection (hpi) (mean + SEM, n = 3) (D) U87MG or 293FT cells were infected with 
Ol of 1 ReoT3D virus. At 72 hpi, cells were lysed and viral titers determined 
through plaque assay [mean + SEM, n = 3, ***P < 0.001, ****P <0.0001; 


significance determined through one-way ANOVA wi 


th post-hoc Dunnett's 


multiple comparisons test for (C) and (D)]. (E) Time course of VSV-EBOV-GP 
infection of 293FT cells WT and LYSET KO (mean + SEM, n = 3). (F) Bar 
graph depicting independent infections of A549-ACE2 cells + LYSET KO with 
SARS-CoV-2-mNeon using flow cytometry (mean + SD, n = 3, ****P < 0.0001; 
significance determined through unpaired, two-tailed student's t-test). 

(G) Infection of SARS-CoV-2 Delta and Omicron spike reporter virus particles 
in Huh7.5.1 cell lines with or without LYSET KO. Cells were engineered to 
stably express ACE2 or ACE2 in combination with TMPRSS2, as indicated. 
Luciferase activity was measured at 72 hpi and normalized to WT cells 

(mean + SEM, n = 6, ***P < 0.001, ****P < 0.0001; significance was determined 
by unpaired, two-tailed student's t-test with Welch's correction). 


pseudotyped with SARS-CoV-2 spike variants 
in cells that expressed or did not express 
TMPRSS2. Consistent with our results with 
spikes corresponding to the early Wuhan strain, 
LYSET KO resulted in decreased entry of both 
delta and omicron variant-pseudotyped viruses 
in cells that did not express TMPRSS2 (Fig. 1G). 
In cells expressing TMPRSS2, the delta variant 
was only moderately affected by LYSET KO 
whereas the omicron variant was still strongly 
dependent on LYSET (Fig. 1G). Thus, our ge- 
netic screens identified LYSET as a host factor 
essential for a diverse group of viruses that 
depend on endosomal protease activation 
during cell entry. 
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LYSET is critical for M6P-mediated lysosomal 
enzyme transport 

The identification of a previously uncharacter- 
ized transmembrane protein potentially involved 
in cathepsin sorting was unexpected as the key 
proteins involved in lysosomal targeting are 
well established (13). To investigate whether 
LYSET affected lysosomal trafficking of cathep- 
sins, we analyzed cathepsin B (CTSB) protein 
levels in 293FT and U87MG wild-type (WT) and 
KO cells (Fig. 2A and fig. S4A). In WT cells, most 
CTSB was present in its mature form following 
autocatalytic cleavage in lysosomes. Only low 
levels were found in the extracellular medium, 
indicating efficient lysosomal sorting and traf- 
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ficking. By contrast, LYSET deficiency resulted 
in aberrant secretion of CTSB precursor forms 
into the medium concomitant with a near com- 
plete loss of mature CTSB in the cell lysate 
(Fig. 2A and fig. S4A). The dysregulation of 
cathepsin transport was equally pronounced 
as observed after disruption of core M6P for- 
mation components (GNPTAB and SIP). We 
extended the analysis to a larger panel of 
cathepsins in 293FT and HAP! cells (fig. S4, B 
and C) and measured cathepsin activity using 
a quenched cell-permeable probe (BMV-109) 
that covalently links to active cysteine cathep- 
sins and gains fluorescence (/8). Compared 
with WT cells, LYSET KO cells displayed a 
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Fig. 2. LYSET is required for global lysosomal enzyme transport through M6P-tagging. (A) Immunoblot 
analysis of CTSB in lysates (intracellular) and in extracellular medium (secreted) from WT cells and clonal 293FT 


cell lines containing KO mutations in indicated genes. ( 


B) Cathepsin activity in cells was determined by 


quantification of BMV-109 fluorescence signal in live 293FT WT and KO cells as the raw corrected total cellular 
fluorescence (CTCF) in arbitrary fluorescence units (AFU) (mean + SD, n = 100 cells, ****P < 0.0001; significance 
determined through one-way ANOVA with a post-hoc Dunnett's multiple comparisons test). (C) Proteomic 
analysis of WT and LYSET KO MEFs by unbiased DIA. DIA was used for intracellular and secreted proteins. 


(D) Z-score analysis of individual peptides that contain 


the M6P moiety as determined using glycoproteomics. 


Peptides were derived from indicated lysosomal proteins in WT, LYSET KO, and GNPTAB KO 293FT cells; 
n= 3 replicates for each cell line. (E) Immunoblot analysis of M6P-tagged proteins from 293FT WT, GNPTAB KO, 
SIP KO, and LYSET KO cells using an M6P-specific, single-chain antibody fragment (M6P). 


strongly decreased fluorescence after live cell 
labeling suggesting a global loss of cysteine 
cathepsin protease activity (Fig. 2B and fig. 
S4, D and E). This indicated a severe defect in 
lysosomal protein targeting because activity 
requires autocatalytic cleavage in lysosomes. 
To investigate whether the missorting was 
specific to cathepsins or pointed to a more 
general defect, we analyzed the consequences 
of LYSET deficiency for additional lysosomal 
proteins and in different cellular contexts. We 
consistently found large increases in secretion 
of typical soluble lysosomal proteins into the 
extracellular medium although their intracel- 
lular protein levels and proteolytic maturation 
were strongly decreased in KO cells generated 
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in primary human fibroblast, U87MG, 293FT, 
HAPI, and SK-MEL-30 cells (fig. S5). In addi- 
tion, we observed an increase in LC3B type II 
levels associated with autophagic membranes 
in cell lysates (fig. S5C). Autophagy markers 
or the accumulation of an autophagosome- 
specific dye were markedly elevated in LYSET 
and GNPTAB KO cells compared with WT cells 
under basal conditions and not further ele- 
vated by blocking lysosomal hydrolase activities 
with chloroquine (fig. S6A) or a combination of 
chloroquine and rapamycin (fig. S6B). This was 
expected because the content of autophagic 
vesicles is degraded by lysosomal hydrolases 
upon fusion with lysosomes (19). To more com- 
prehensively assess the effects of LYSET knock- 
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out on a primary cell type, we used quantitative 
proteomic approaches in mouse embryonic 
fibroblasts (MEFs). We first generated C57BL/ 
6J mice with deletions in Lyset by gene edit- 
ing. We designed sgRNAs to excise a region in 
the second exon of Lyset resulting in two 
mouse lines with distinct out of frame deletion 
variants (of A184 and A199 nucleotides) (fig. S7, 
Ato Cand E). To generate mice with Lyset gene 
deletions (referred to as Lyset KO), mice were 
bred as compound heterozygotes (A184/A199) 
or homozygotes. We verified loss of Lyset pro- 
tein expression in different tissues (fig. S7D). 
To investigate the extracellular proteome, we 
collected and concentrated serum-free con- 
ditioned medium from WT and Lyset KO MEFs. 
In parallel, we prepared cell lysates to deter- 
mine the intracellular proteome. Using data 
independent acquisition (DIA) mass spec- 
trometry (fig. S8A) (20), we detected and 
quantified more than 4000 proteins in the 
intracellular proteome and more than 1000 
proteins in the secretome, showing a high cor- 
relation between replicates (fig. S8, B to E, and 
table $3). Only a small subset of proteins were 
differentially expressed in Lyset KO versus WT 
cells. Most of the proteins found in higher 
abundance in the secretome of Lyset KO MEFs 
were well-characterized luminal lysosomal pro- 
teins (indicated in red) whereas these proteins 
were depleted from the intracellular proteome 
(Fig. 2C and fig. S9A). The protein levels of two 
lysosomal enzymes known to traffic indepen- 
dently from M6P (Gba and Acp2) were unaf- 
fected (Fig. 2C), which we confirmed using 
specific activity assays (fig. S10). Moreover, mem- 
brane lysosomal proteins (indicated in blue) 
were unaffected (Fig. 2C). Similar results were 
obtained using parallel reaction monitoring 
(PRM), a targeted method that provides more 
sensitive quantification of a smaller subset of 
lysosomal proteins (fig. S9B) (20). Thus, LYSET 
deficiency results in a severe defect in lysosomal 
trafficking that is specific for M6P cargoes. 

Key steps in lysosomal enzyme sorting 
include modification with M6P by GlcNAc-1- 
phosphotransferase in the Golgi apparatus 
and the subsequent binding to M6P-specific 
receptors that mediate the transport to the 
lysosome (13). To distinguish between an early 
defect in tagging or a later defect in M6P rec- 
ognition, we used an unbiased proteomic ap- 
proach to detect M6P modifications directly 
on glycoproteins isolated from WT cells and 
cells deficient in LYSET. This method, based 
on immobilized metal ion affinity chromatog- 
raphy (Fe3*-IMAC), allows for enrichment of 
glycopeptides containing the negatively charged 
M6P modification followed by glycopeptide 
identification using mass spectrometry (fig. 
S11A) (272). In WT 293FT cells we readily identi- 
fied glycopeptides containing M6P from canonical 
lysosomal enzymes (Fig. 2D, fig. SIIB, and table 
S4). As expected, these M6P glycopeptides 
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Fig. 3. LYSET binds to GNPTAB and colocalizes with GNPTAB/GNPTG in Golgi apparatus cisternae. 
(A) Immunofluorescence microscopy in HAPI cells showing colocalization of LYSET with the cis-Golgi marker 
GM130 along with GNPTAB. Scale bar, 10 um (B) Transmission electron microscopy (TEM) immunogold 
staining (15 nm gold) shows LYSET in Golgi cisternae of SK-MEL-30 WT cells using ultrathin sections. 


(C) TE 


double immunogold staining on ultrathin sections for LYSET (15 nm gold) and GNPTG (10 nm gold) 


in the Golgi apparatus of SK-MEL-30 WT cells. Arrowheads indicate colocalization. Scale bar, 200 nm. 
(D) Immunoprecipitation (IP) on lysates of cells expressing epitope-tagged proteins using anti-FLAG 
(left panel) or anti-MYC (right panel) magnetic beads, followed by immunoblot analysis with indicated 


antibodies. Input lysates are also analyzed. 


were absent in cells lacking GNPTAB. The 
LYSET KO cells displayed a similar absence of 
M6P-modified glycopeptides. This was con- 
firmed in an orthologous assay using a single- 
chain M6P-specific antibody fragment (22). 
Lysates of 293FT and HAP1 cells devoid of 
LYSET showed a large decrease in M6P im- 
munoreactive glycoproteins, comparable to 
decreases observed with GNPTAB KO and SIP 
KO cells (Fig. 2E and fig. S11C). Thus, LYSET 
is essential for an early step in lysosomal 
enzyme transport and the generation of the 
M6P modification. 


LYSET binds to GNPTAB and regulates its 
function by mediating proper Golgi localization 


LYSET is a small, poorly characterized protein 
containing two transmembrane domains. It 
colocalizes with GNPTAB and GNPTG in the 
Golgi apparatus (Fig. 3, A to C, and fig. S12, A 
to C). We performed immunoprecipitation (IP) 
followed by immunoblotting experiments to 
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test whether LYSET interacts with GNPTAB. 
After coexpression of epitope-tagged LYSET 
and GNPTAB we detected GNPTAB-MYC in 
the LYSET-FLAG IP and LYSET-FLAG in the 
GNPTAB-MYC IP (Fig. 3D). These reciprocal 
co-IPs appeared to be specific as the endogenous 
Golgi protein Giantin was not detected and 
neither LYSET or GNPTAB was found in con- 
trol IPs with epitope-tagged mNeonGreen. This 
interaction suggests that LYSET plays a role 
in modulating the function of the GlcNAc-1- 
phosphotransferase complex. Because pathogenic 
mutations in GNPTAB leading to mislocaliza- 
tion of the GlcNAc-1-phosphotransferase com- 
plex have been reported (23, 24), we hypothesized 
that LYSET might function by retaining GNPTAB 
in the Golgi apparatus. We examined this by 
immunofluorescence analysis for endogenous 
GNPTAB and co-staining with antibodies for 
the Golgi apparatus (GM130) and lysosomes 
(LAMP2). The staining pattern in LYSET KO 
HAPI cells was clearly distinct from WT cells. 
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Whereas in WT cells GNPTAB was localized 
predominantly in the Golgi apparatus LYSET 
KO resulted in a loss of Golgi colocalization 
and a gain in localization to lysosomal struc- 
tures (Fig. 4A). To investigate this further, we 
characterized the expression of the endogenous 
GlcNAc-1-phosphotransferase complex in sub- 
cellular fractions. We used sequential centrif- 
ugation to isolate a 20K fraction enriched in 
lysosomes and a 100K fraction enriched in ER/ 
Golgi membranes (25). LYSET deficiency resulted 
in a near complete loss of endogenous GlcNAc-1- 
phosphotransferase a-subunit protein levels in 
the 100K fraction (Fig. 4B). This loss was post- 
transcriptional because GNPTAB mRNA levels 
did not substantially differ between WT and 
LYSET KO cells (fig. S13). Inhibition of protea- 
somal degradation with epoxomicin did not 
rescue expression of GNPTAB in LYSET KO 
cells (fig. S14, A and B). By contrast, preventing 
lysosomal degradation by blocking organellar 
acidification (bafilomycin A1) or by protease 
inhibition (PI) (E64d/leupeptin/pepstatin A) 
increased GNPTAB protein levels specifically 
in the lysosome-enriched 20K fraction (Fig. 
4C and fig. S14, C to E). Moreover, following 
bafilomycin treatment we observed immuno- 
reactive bands of higher electrophoretic mobility 
likely corresponding to partial cleavage products 
(Fig. 4C), which was also observed in magnetite- 
purified lysosomes (fig. S14F). These results sug- 
gest that GNPTAB is degraded in lysosomes. 
Immunofluorescence microscopy in LYSET KO 
cells revealed that most endogenous GNPTAB 
remained colocalized with a lysosomal marker 
upon treatment with both inhibitors (fig. S14, 
G and H). However, in bafilomycin—but not 
PI-treated—cells some GNPTAB colocalized 
with the Golgi apparatus (fig. S14, G and H). 
The latter could be due to a disturbance of the 
pH within the Golgi apparatus by bafilomycin 
that could lead to aberrant post Golgi traf- 
ficking (26). These data support a model in 
which LYSET interacts with GNPTAB and plays 
a major role in proper localization of GNPTAB 
in Golgi stacks. In the absence of LYSET, GNPTAB 
is mislocalized to the lysosomes where it is 
degraded. 


LYSET’s role in lysosomal transport provides a 
disease mechanism for a previously described 
genetic disorder 


It has long been recognized that mutations in 
genes encoding the core components of lyso- 
somal enzyme trafficking cause specific in- 
herited syndromes including mucolipidosis IT 
(MLII) in which defective GNPTAB mutations 
are etiological (27, 28). Recently, biallelic LYSET 
mutations have been described in individuals 
from two families with a recessive genetic disorder 
with characteristics of mucopolysaccharidoses/ 
mucolipidosis (29). However, the mechanistic 
basis upon which to link LYSET with the 
disease was tentative. Our results suggest that 
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Fig. 4. LYSET KO triggers mislocalization of GNPTAB to the lysosome 
where it is degraded. (A) Colocalization of GNPTAB with GM130 and 
LAMP2 in WT and LYSET KO cells and Manders colocalization quantification. 


Colocalization analysis was performed on at least n 
micrograph images. Scale bar, 10 um (B) Immunob 


in 100K-enriched ER/Golgi-fractions from WT, GNPTAB KO, and LYSET KO 


defects in lysosomal protein sorting caused by 
LYSET deficiency might underlie this genetic 
disorder. Elevated serum level of lysosomal en- 
zymes is a defining feature of MLI. The mea- 
surement of these enzymes in serum is used 
diagnostically to distinguish MLII from other 
metabolic diseases that cause similar clinical 
features (28). Compared with WT mice, serum 
from Lyset KO mice showed markedly higher 
enzyme activities of all tested lysosomal enzymes 
including o-mannosidase, $-hexosaminidase, 
a-L-fucosidase, and f-galactosidase (Fig. 5A). 
Furthermore, electron microscopy (EM) analy- 
sis of Lyset KO MEFs showed obvious morpho- 
logical changes in lysosomes (Fig. 5B). Lysosomes 
with electron-dense material accumulated in 
the cytoplasm, another characteristic feature 
of lysosomal storage disorders (28). Quanti- 
tative analysis of EM images revealed that 
lysosomes were considerably larger and more 
numerous in Lyset KO MEFs although the cell 
area was not substantially different compared 
with WT cells (Fig. 5C and fig. S15, A and B). 
In line with this, aberrant lysosomes were 
also observed in human cells lacking LYSET 
(fig. S16). In addition, Lyset KO MEFs were 
resistant to infection by VSV-EBOV-GP, rein- 
forcing our results in human cell lines (Fig. 5D 
and movies S9 and S10). Despite showing char- 
acteristic MLII phenotypes including increased 
lysosomal enzyme serum levels and storage 
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= 12 separate 
ot analysis of GNPTAB 


materials in lysosomes, Lyset KO mice did not 
display obvious clinical symptoms as observed 
in human patients with LYSET mutations. Sim- 
ilarly, symptoms in the Gnptab KO mouse are 
less severe than in human MLII disease (30). 
To link the human patient mutations in 
LYSET (29) directly to lysosomal protein sorting 
we first established that lentiviral complemen- 
tation of WT LYSET (isoforms 1 and 2) could 
restore CTSB missorting and maturation in 
293FT LYSET KO cells (fig. S17). Subsequently, 
we tested complementation by LYSET contain- 
ing the Y72Ter or R45W pathogenic mutations 
(29). As controls we used WT LYSET as well 
as three nonsynonymous LYSET variants not 
associated with disease, which occur frequently 
in the population. R45W displayed slightly lower 
protein expression levels. As expected, Y72Ter 
was not detected because this frameshift muta- 
tion leads to a premature stop codon. While all 
controls corrected CTSB missorting, Y72Ter and 
R45W failed to do so (Fig. 6A). Similarly, R45W 
expression did not rescue the lysosomal storage 
defects observed using electron microscopy in 
LYSET KO cells (fig. S18). Moreover, the patho- 
genic allele R45W did not rescue the loss of 
endogenous GNPTAB in 100K fractions (Fig. 6B). 
Coimmunoprecipitation experiments showed 
a severe defect in R45W in binding with 
GNPTAB (Fig. 6C and fig. S19). Because R45W 
fails to rescue the lysosomal trafficking defect, 
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HAPI cells. The antibody is specific for the a-subunit domain of GNPTAB. 
Golgi-marker (GM130) and ER-marker (PDI) proteins were used as loading 
controls. (©) Immunoblot analysis of 20K and 100K fractions from WT and 
LYSET KO HAP1 cells with or without bafilomycin Al (BafA1) treatment. 
LAMP2, GM130 and GOLGIN-97 proteins were used as controls for 
preparation and loading. 


these results suggest that the interaction of 
LYSET with GNPTAB is critical for GNPTAB 
function and that mutations in LYSET that 
affect this interaction can contribute to disease 
development. Thus, LYSET deficiency causes 
several defining features of mucolipidosis IT 
and patient mutant alleles are compromised 
in the role of LYSET in M6P-dependent lysoso- 
mal protein sorting. 


Conclusion 


We establish LYSET as a Golgi-resident protein 
essential for M6P-mediated lysosomal protein 
trafficking. Our results support a model in 
which LYSET is essential for the activity of 
GlcNAc-1-phosphotransferase by binding to 
and retaining it in the Golgi apparatus. LYSET 
is relevant to human disease as patients with 
biallelic LYSET mutations suffer a genetic in- 
herited disorder (29). Based on our elucidation 
of the role of LYSET in cell physiology, we 
propose that this disorder is similar to MLI. 
Because the clinical symptoms of mucolipidosis 
and mucopolysaccharidosis overlap and not all 
cases can be explained by mutations in known 
disease genes, LYSET sequencing may help to 
identify disease-causing mutations and, con- 
sequently, provide a more accurate diagnosis 
in patients. Furthermore, as an important com- 
ponent of lysosomal function, LYSET is essen- 
tial for infection by diverse highly pathogenic 
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viruses that rely on endolysosomal activation by 
cathepsins. 


Materials and methods summary 


A detailed version of the materials and methods 
is provided in the supplementary materials. 
In brief, genome-scale CRISPR-Cas9 knockout 


libraries were generated in eHAP and U87MG 


cells. Libraries were infected with reovirus type 
3D to select for a cell population containing 
mutations that confer resistance to viral in- 
fection. After genomic DNA isolation, PCR was 
used to amplify sequences encoding guide RNAs 
and their abundance was quantified using next 
generation sequencing. Statistical analysis was 
performed to identify and rank genes that were 
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Fig. 5. Lyset KO mice display characteristics used to diagnose ML-II disease including elevated blood 
serum levels of lysosomal enzymes and aberrant lysosomes in isolated cells. (A) Relative enzyme 
activities of o-mannosidase (a-man), B-hexosaminidase (B-hex), a-L-fucosidase (a-L-fuc) and B-galactosidase 
(B-gal) from blood sera of adult WT and Lyset KO mice; WT activities set to 1; mean + SD, n = 5 mice, 
***P < 0.001, ****P < 0.0001, unpaired student's t-test (two-tailed). (B) Electron micrographs of WT and 
Lyset KO MEFs. Scale bar, 1 um. (©) Numbers of lysosomes and areas counted for 25 independent images for 
WT and Lyset KO MEFs; mean + SEM; unpaired, two-tailed student's t-test with Welch's correction; *P < 0.05, 
**P <(0,0001. (D) Time course of VSV-EBOV-GP infection of WT and KO MEFs (mean + SD, n = 3). 
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Fig. 6. Pathogenic LYSET mutations fail to restore lysosomal transport defects in LYSET KO cells. (A) Immunoblot of 293FT 
LYSET KO cells complemented with WT LYSET (addback) or human variants. R45W an 


enriched in the selected population compared 
with the uninfected population. 

LYSET KO mutations were introduced in 
different human cell lines using CRISPR-Cas9. 
Cells were infected with reovirus type 3D and 
viral infection was assessed using crystal violet 
assay, RT-qPCR and plaque forming assay. For 
infections with VSV pseudotyped with EBOV 
GP or SARS-CoV-2 S, viral entry was assayed 
using live cell imaging with an Incucyte system 
to monitor VSV-encoded GFP expression. 

The endogenous expression of proteins in- 
volved in M6P lysosomal protein transport 
was determined in extracellular medium, in- 
tracellular lysates and subcellular organelle 
fractions in WT and KO cells using immuno- 
blotting, lysosomal enzyme activity assays and 
mass-spectrometry. Binding between LYSET 
and GNPTAB was determined using coimmu- 
noprecipitations from lysates of cells transfected 
with plasmids encoding epitope-tagged proteins. 

All experiments involving mice were approved 
by Stanford’s Institutional Animal Care and 
Use Committee. C57BL/6J zygotes were pro- 
nuclear injected with Cas9 RNPs targeting 
Lyset (Tmem25]1) with two synthetic gRNAs to 
introduce a frameshift mutation. 

For electron microscopy, cells were fixed, 
osmicated, and Epon polymerized. Ultrathin 
sections (60 nm) were prepared and examined 
in an EM902 microscope. For postembedding 
immunogold labeling, ultrathin sections (60 nm) 
were prepared from cryoprotected cell pellets 
(2.3 M sucrose), collected on Carbon-Formvar- 
coated nickel grids, and incubated with one or 
two antibodies followed by protein A-coupled 
to colloidal gold particles. Images were ac- 
quired with a JEM- 2100Plus Transmission 
Electron Microscope. 

For immunofluorescence, cells treated or not 
for 24 hours with lysosomal protease inhibitors 
deupeptin, pepstatin A and E64d) or bafilomy- 
cin Al were fixed with 4% PFA, blocked and 
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associated with disease. (B) Immunoblots of 100K-enriched Golgi/ER fractions from 293FT KO cells complemented with indicated LYSET alleles. (C) Immunoprecipitation 


(IP) on lysates of cells expressing epitope-tagged proteins using ant 
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i-MYC magnetic beads followed by immunoblot analysis with indicated antibodies. 
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DEVELOPMENT 


Translatome and transcriptome co-profiling reveals 
a role of TPRXs in human zygotic genome activation 


Zhuoning Zout, Chuanxin Zhang}, Qiuyan Wang}, Zhenzhen Hout, Zhuqing Xiong, Feng Kong, 
Qiujun Wang, Jinzhu Song, Boyang Liu, Bofeng Liu, Lijuan Wang, Fangnong Lai, Qiang Fan, 
Wenrong Tao, Shuai Zhao, Xiaonan Ma, Miao Li, Keliang Wu, Han Zhao*, Zi-Jiang Chen*, Wei Xie* 


INTRODUCTION: During the mammalian oocyte- 
to-embryo transition (OET), translation plays 
a critical role in regulating meiotic resumption, 
zygotic genome activation (ZGA), and early 
embryonic development. ZGA marks the first 
transcription event in a new life and the on- 
set of the embryonic program. However, how 
mammalian ZGA is initiated remains poorly 
understood. For example, although key ZGA 
transcription factors (TFs) have been well 
characterized in other species such as zebra- 
fish and fly, which TFs control human ZGA 
remains elusive. 


RATIONALE: Studying the translatomes in hu- 
man oocytes and early embryos is critical for 
understanding their posttranscriptional reg- 
ulation during human OET and identifying 
candidate ZGA regulators. In particular, the 
TFs regulating ZGA are expected to arise either 
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from translation of oocyte-inherited transcripts 
or from transcription during the early phase of 
ZGA, and their motifs should be enriched at 
enhancers and promoters of ZGA genes. How- 
ever, translatome profiling in human oocytes 
and early embryos is severely hindered by the 
scarcity of research materials. Therefore, we 
sought to first profile the translatomes and 
transcriptomes from the same low-input sam- 
ples of human oocytes and early embryos using 
an ultrasensitive method. Combined with analy- 
ses of the assay for transposase-accessible chro- 
matin sequencing (ATAC-seq) datasets in human 
early embryos that we reported previously, can- 
didate TFs for human ZGA were then identi- 
fied. Their potential roles in genome activation 
and early development were assessed by gene 
knockdown in human embryos and overex- 
pression analyses in human embryonic stem 
cells (hESCs). 
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Translatome-transcriptome co-profiling in human oocytes and early embryos reveals key human 
ZGA factors. The translatome and transcriptome were co-profiled in human oocytes and early embryos. 
Comparison with mouse reveals widespread species-specific translation from oocytes (full-grown oocytes 
depicted) to embryos, in part driven by distinct configurations of regulatory elements in 3'UTRs, 
including CPEs. The TPRX family TFs TPRXL/1/2 are highly translated around ZGA and are identified 


as critical regulators of human ZGA. 
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RESULTS: By combining the ultrasensitive Ribo- 
lite (igation-free, ultra-low-input, and enhanced 
Ribo-seq) with Smart-seq2 in a method we call 
Ribo-RNA-lite (R2-lite), we jointly profiled the 
translatome and transcriptome across eight stages 
of human oocytes and early embryos. Through 
comparison with their counterparts in mouse, we 
found not only genes with conserved transla- 
tional activities but also widespread, differen- 
tially translated genes functioning in epigenetic 
reprogramming, transposon defense, and small 
RNA biogenesis. Species-specific translation is in 
part driven by different configurations of reg- 
ulatory elements such as cytoplasmic polyadenyl- 
ation element (CPE) and polyadenylation signal 
site (PAS) in the 3’ untranslated regions (3'UTRs). 
Using the R2-lite data, we found that a group 
of PRD-like homeobox TFs became highly trans- 
lated before or during ZGA, with their motifs 
enriched in distal open chromatin regions (pu- 
tative enhancers) near activated genes upon 
ZGA. These TFs include TPRXL, which is en- 
coded by a CPE-containing maternal transcript 
subjected to translation up-regulation upon 
meiotic resumption, and TPRX1/2, which are 
expressed during the early phase of ZGA (mi- 
nor ZGA). The joint knockdown of TPRX1/2/L 
[TPRX triple KD (TKD)] led to severe defects 
in development and ZGA. About 31% of ZGA 
genes that preferentially contain PRD-like TF- 
binding motifs at promoters and nearby pu- 
tative enhancers were down-regulated in TPRX 
TKD embryos. These TPRX target genes include 
ZSCAN4, DUXB, DUXA, NANOGNB, DPPA4, 
GATA6, DPRX, ARGFX, RBP7, and KLF5, many 
of which encode key transcription regulators. 
Finally, ectopically expressed TPRXs could bind 
and activate a subset of ZGA genes in hESCs. 


CONCLUSION: Here, we charted the translational 
landscapes during the human OET. Comparison 
of data in mouse identified widespread differ- 
entially translated genes, in part driven by 
species-specific configuration of key regula- 
tory elements in the 3'UTRs. This dataset fur- 
ther identified a group of PRD-like homeobox 
TFs, including TPRXL, TPRX1, and TPRX2, 
that are highly translated around ZGA. TPRXs 
are required for proper ZGA and preimplanta- 
tion development and are also sufficient to ac- 
tivate key ZGA genes when ectopically expressed 
in hESCs. Therefore, these data not only reveal 
the conservation and divergence of transla- 
tional regulation during OET but also identify 
critical TF regulators of human ZGA. = 
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Translatome and transcriptome co-profiling reveals 
a role of TPRXs in human zygotic genome activation 
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Translational regulation plays a critical role during the oocyte-to-embryo transition (OET) and zygotic 
genome activation (ZGA). Here, we integrated ultra-low-input ribosome profiling (Ribo-lite) with 
messenger RNA sequencing to co-profile the translatome and transcriptome in human oocytes and early 
embryos. Comparison with mouse counterparts identified widespread differentially translated gene 
functioning in epigenetic reprogramming, transposon defense, and small RNA biogenesis, in part driven 
by species-specific regulatory elements in 3’ untranslated regions. Moreover, PRD-like homeobox 
transcription factors, including TPRXL, TPRX1, and TPRX2, are highly translated around ZGA. TPRX1/2/L 
knockdown leads to defective ZGA and preimplantation development. Ectopically expressed TPRXs bind 
and activate key ZGA genes in human embryonic stem cells. These data reveal the conservation and 
divergence of translation landscapes during OET and identify critical regulators of human ZGA. 


uman infertility is a rising global issue, 
and understanding human oocyte mat- 
uration and early embryonic development 
is critical in deciphering its etiology. After 
entering meiosis, mammalian oocytes 
are arrested at prophase I and only resume 
meiosis upon hormone stimulation. The ge- 
nome stays transcriptionally silenced from 
late-stage full-grown oocytes (FGOs) to fer- 
tilized embryos before zygotic genome acti- 
vation (ZGA) (7). As a result, the oocyte-to-embryo 
transition (OET) is often driven by posttran- 
scriptional regulation (2). This involves stage- 
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specific translational regulation of pretranscribed 
mRNAs in oocytes, including translational ac- 
tivation of “dormant” maternal mRNAs, which 
initially stay untranslated after being tran- 
scribed (J, 3). 

In Xenopus oocytes, translation is mediated 
by cis-acting sequences in the 3’ untranslated 
regions (3'UTRs), including cytoplasmic poly- 
adenylation element (CPE) bound by cytoplas- 
mic polyadenylation element binding protein 
1 (CPEB1) and the polyadenylation hexanu- 
cleotide signal (PAS) recognized by the cleavage 
and polyadenylation specificity factor (CPSF) 
(3-6). CPEB1 suppresses polyadenylation in 
FGOs but promotes polyadenylation during 
meiotic resumption upon phosphorylation. In 
the mouse, CPEB] is essential for fertility and 
oocyte maturation, and its phosphorylation 
similarly results in translational activation of 
several dormant RNAs (7, 8). However, how 
translation is regulated in human OFT is poorly 
studied. 

ZGA marks the first transcription event in a 
new life (9), but how it is initiated in mammals 
remains elusive. DUX in the mouse and its 
counterpart, DUX4, in humans activate a sub- 
set of genes preferentially expressed during 
minor ZGA, the early wave of ZGA (e.g., Zscan4/ 
ZSCAN4 and MERVL/HERVL) (10-12). How- 
ever, Dux deficiency in mice causes only minor 
defects in ZGA and is compatible with de- 
velopment (73-16). Similarly, knockdown (KD) 
of DUX4 in human embryos disturbs gene 
expression without affecting preimplantation 
development during the studied period (17). 
Moreover, Dux and DUX4 themselves are acti- 
vated during minor ZGA (10-12). Thus, addi- 
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tional key transcription factors (TFs) for 
mammalian ZGA remain to be discovered. 
ZGA regulators may arise from the translation 
of maternally deposited RNAs, as exemplified 
by Nanog, SoxB1, and Poudfl in zebrafish and 
Zelda in fly (9). Recently, we reported an ultra- 
sensitive ribosome profiling method [ligation- 
free, ultra-low-input and enhanced Ribo-seq 
(Ribo-lite)] that can accommodate as few as 
50 murine embryonic stem cells (mESCs) or 
single mouse oocytes (78). Here, we applied 
Ribo-lite coupled with RNA sequencing (RNA- 
seq) to human oocytes and early embryos to 
investigate the translational regulation during 
human OET and to identify possible regula- 
tors of human ZGA. 


Results 

Co-profiling of translatome and transcriptome 
landscapes of human oocytes and 
preimplantation embryos by Ribo-RNA-lite 


We combined Ribo-lite (78) with Smart-seq2 (a 
low-input RNA-seq method) (19) by splitting 
the lysed samples for transcriptome and trans- 
latome co-profiling in a method we call Ribo- 
RNA-lite (R2-lite) (Fig. 1A and fig. S1A). The 
global patterns of ribosome-protected frag- 
ments (RPFs) and mRNAs from R2-lite in 
1000 human embryonic kidney 293 (HEK293) 
cells correlated with those of the conventional 
Ribo-seq data (20) and the reference Smart- 
seq2 data (78), respectively (fig. SIB). RPFs, but 
not mRNAs, showed typical Ribo-seq features, 
including the expected mapping rates (fig. 
S1C), the depletion of signals from 3'UTRs (fig. 
S1, D and E), and the characteristic 3-nucleotide 
(nt) periodicity (fig. SIF). Exogenous External 
RNA Controls Consortium (ERCC) RNA added 
to the samples was well detected by mRNA- 
seq but not by Ribo-lite (fig. SIG). These data 
demonstrate that R2-lite could efficiently pro- 
file the transcriptome and translatome from 
the same low-input sample. 

Next, we performed R2-lite in human oocytes 
and early embryos (using four to 25 oocytes 
or embryos for each stage and two replicates; 
table S1), including FGOs, metaphase I (MI) 
oocytes, metaphase II (MII) oocytes, one-cell 
(1C), two-cell (2C), four-cell (4C), and eight-cell 
(8C) embryos, as well as inner cell mass (ICM) 
from the blastocysts (fig. SIA). We also in- 
cluded primed human embryonic stem cells 
(hESCs) as a control. Ribo-lite captured 7790 
to 10,854 expressed genes [fragments per kilo- 
base per million (FPKM) > 1] and mRNA-seq 
detected 7689 to 9056 genes (FPKM > 1) across 
stages (fig. S2A and FPKM in table S2) with 
high reproducibility (R > 0.75) (fig. $2, B and 
C). The variations between replicates were com- 
parable to the Ribo-seq data in mouse oocytes 
and early embryos (78, 27), as well as mESCs 
(22) (fig. S2D). Ribo-lite data also showed the 
expected low 3’UTR read rates (fig. S3A) and 
prominent 3-nt periodicity (fig. $3, B and C) 
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Fig. 1. Genome-wide mapping of translatome and 
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transcriptome in human oocytes and mouse early 


embryos. (A) Schematic of R2-lite. (B) Principal component analysis (PCA) of transcriptomes and 
translatomes for human oocytes and early embryos. (C) Schematic of human and mouse oocytes and embryos 


used for comparison. (D) Heatmaps showing the RPF 


of human and mouse homologous genes with conserved 


translation patterns. The enriched Gene Ontology (GO) terms and genes are also listed. (E) Boxplots of CPE density 


(count per nucleotide), CPE number, papCPE number, 


and 3'UTR length (in nucleotides) of human-mouse 


conserved genes of each class in (D). (F) Top, pie charts showing the percentages of destabilized and stable 
maternal mRNAs among FGO-MIll translationally down-regulated genes in human and mouse. Bottom, scatterplots 
comparing RPF fold change (log2 ratio) with mRNA fold change (log2 ratio) from FGO to MII oocytes in human 


and mouse. Dashed lines indicate a threefold change. 


(G) Boxplots of CPE density (count per nucleotide), CPE 


number, papCPE number, and 3'UTR length (in nucleotides) of all FGO-MIl translationally down-regulated genes 
(green, sum of stable and destabilized maternal mRNA), destabilized (red), and stable (blue) mRNAs in human and 
mouse. P values (unpaired two-sample t test) are as follows: ***P < 0.001; ****P < 0.0001; ns, not significant. 


comparable to the bulk HEK293 reference (20) 
and low-input hESCs. One of the “gold stan- 
dards” of translation profiling in oocytes is 
“dormant RNAs,” which show low translation 
efficiency (TE) in FGOs but become actively 
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translated upon meiotic resumption (J, 3). We 
collected a number of well-known dormant 
mRNAs reported in mouse, as well as potential 
dormant mRNAs from the proteome of human 
oocytes (23), including key regulators for meio- 
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tic maturation (WOS, WEE2, and FBXO43), 
maternal RNA clearance (BTG4, CNOT6L/7/8, 
and DCPIA), cell-cell adhesion (LIMAI and 
CTNNB1), mRNA splicing regulators (WTAP 
and MAGOH), a DNA methylation regulator 
(TET3), TF (NFYB), pyridoxal phosphate-binding 
protein (ACCSL), and epoxide hydrolase (LTA4H) 
(18, 23). Most of these mRNAs exhibited trans- 
lational repression in FGOs (low TE or RPF/ 
mRNA) but became activated for translation in 
MII oocytes (fig. S4A), as supported by TE 
analysis (fig. S4, B and C), a result reproduced 
in replicates (fig. S4, A and B). This contrasted 
with their mRNA levels, which were much less 
dynamic (fig. S4D, “BTG#” and “CNOT6L”). The 
moderate increases of mRNAs from FGOs 
to MII oocytes were likely caused by in- 
creased capture efficiency of lengthened 
poly(A) tails upon meiotic resumption by 
mRNA-seq (/8, 19, 24, 25). Finally, we also iden- 
tified differentially translated genes (DEGs) 
or RPF DEGs between every two consecutive 
stages, which revealed marker genes and gene 
functional enrichment that generally match 
the expected human developmental processes 
(fig. S5). For example, translation up-regulation 
was observed for the marker genes ZSCAN4, 
ZSCANSA/B/C, and DUXA/B from 2C to 4C and 
GDF3/PDGFRA/DNMT3L/POUSF1 from 8C to 
ICM. Therefore, R2-lite successfully captures 
transcriptional and translational activities in 
human oocytes and early embryos. 


Dynamic human translatome in oocytes 
and early embryos 


Human ZGA occurs in two waves: minor ZGA 
at 1C to 4C and major ZGA around 8C (26). For 
convenience, we use “ZGA” to refer to major 
ZGA unless otherwise noted. We found that 
the transcriptomes were similar from FGO to 
4C, followed by a major switch around 8C (Fig. 
1B). By contrast, translatomes were more dy- 
namic, showing multiple transitions upon mei- 
otic resumption (FGO to MI and MI to MID and 
the onset of major ZGA (from 4C to 8C) (Fig. 1B 
and fig. S6A). At each stage, the translatome and 
transcriptome exhibited distinct landscapes in 
human oocyte and pre-ZGA embryos, but they 
tended to show more concordance after ZGA 
(fig. S6B). On the basis of their translation dy- 
namics, we classified genes into four major 
classes (fig. S6C, left, and table S3). The first 
class, “OET up-regulated” genes, were trans- 
lated at low levels in FGOs but at high levels 
in MII oocytes. Examples included those in- 
volved in mRNA destabilization (BTG4, PAN2/3, 
and CNOT6/6L/7/8) and protein ubiquitination 
(FBXs and RNFs). This was consistent with 
the notion that maternal mRNA and protein 
clearance are required for proper OET and 
ZGA (2). This class also included key cell- 
cycle genes (i.e., CCNBI and CDC20) that reg- 
ulate meiotic resumption and progression. The 
second class were “oocyte-specific” genes. These 
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were highly translated from the FGO stage, but 
their translation was gradually down-regulated 
during OET, and they were barely translated 
after ZGA (e.g., ZP1/2/3/4 and GDF9). The third 
class, “OET down-regulated” genes, were highly 
translated in FGOs, but their translation ceased 
in MII oocytes, before it was reactivated in early 
embryos. This class included genes that were 
involved in ribosome biogenesis (RPS/RPL), the 
mitochondrial respiratory chain (VDUF dehy- 
drogenase), and transcription (POLR). The 
fourth class, “embryonic” genes, were tran- 
scribed and translated around or after ZGA 
and preferentially functioned in transcription 
and embryonic development. The correspond- 
ing MRNA levels for these four classes were 
generally correlated with the RPF levels, but 
there were differences (fig. S6C, arrows). Thus, 
translation undergoes dynamic regulation in 
human oocytes and early embryos. 


Conservation of human and mouse translational 
landscapes during OET 


We investigated to what extent gene expres- 
sion during OET is conserved between human 
and mouse by comparing their translatomes 
and transcriptomes from the equivalent stages 
(18), first focusing on the homologous genes 
(Fig. 1C). The mouse data that we used to com- 
pare with the data from human MI oocytes 
were from a slightly earlier stage of late pro- 
metaphase I (LPI). We also grouped mouse 
genes into the same four classes on the basis of 
their translation dynamics (table S3). Overall, a 
substantial percentage of human genes in the 
four classes (51.9% of OET up-regulated, 29.7% 
of oocyte-specific, 61.1% of OET down-regulated, 
and 45.7% of embryonic) fell in the same 
classes in mouse (Fig. 1D and fig. S6C; see fig. 
S6D for examples). Together, they constituted 
50.5% (3018 of 6106) of human genes from 
the four classes combined. The similar trans- 
lation dynamics of these genes across species 
suggest strong evolutionary constraints be- 
cause of their essential roles during OET (e.g., 
RNA degradation, oocyte development, trans- 
lation, and embryonic development) (Fig. 1D). 

Translation repression in FGOs and subse- 
quent activation upon meiotic resumption are 
critically controlled by CPE through dynamic 
regulation of deadenylation and polyadenyla- 
tion (3, 4). CPEs are most effective when they 
are near PASs [PAS proximal CPEs (papCPEs)] 
within ~100 nt in mouse (8), consistent with 
previous studies (6, 27, 28). A similar trend was 
observed for humans: The translation of genes 
with papCPEs tended to be repressed in FGOs 
and reactivated in MII oocytes (fig. S7, A and 
B). CPEs within 100 to 150 nt of PASs were also 
effective but only when more than one was 
present, and the number of such genes was 
limited (fig. S7A). By contrast, the translation 
of genes with no CPEs was preferentially down- 
regulated during the same period; the trans- 
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lation of those with non-papCPEs remained 
largely constant (fig. S7, A and B). Globally, 
CPE densities, CPE counts, and papCPE num- 
bers were highest for OET up-regulated tran- 
scripts and lowest for OET down-regulated 
transcripts both in human and mouse (Fig. 1E). 
It has been shown that short 3’UTRs promote 
deadenylation in zebrafish and mammalian 
embryos (29-31). Consistently, OET up-regulated 
and OET down-regulated transcripts showed 
longer and shorter 3’UTRs, respectively, in hu- 
man and mouse (Fig. 1E). Therefore, the rela- 
tionship between CPE and translation regulation 
is likely conserved between human and mouse. 


Widespread maternal RNA destabilization occurs 
in mouse MII oocytes 


Translation of the OET down-regulated genes 
was down-regulated from FGO to MII oocytes. 
This was frequently accompanied by decreases 
in mRNAs in mouse (1445 of 3305 genes, or 
43.7%), but not in human (only 260 of 2829, 
or 9.1%) (Fig. 1F, also see fig. S6C, mRNA-OET 
down-regulated, compare mouse and human). 
We considered the transcripts with mRNA de- 
crease to be “destabilized,” which may reflect 
either the deadenylation or decay of RNA, and 
those that did not show evident mRNA de- 
crease to be “stable.” Indeed, two recent studies 
showed that although widespread deadenyla- 
tion occurs from FGOs to MII oocytes in both 
species, human retains much longer poly(A) 
tails in MII oocytes compared with mouse in 
MII oocytes (32, 33). Stable and destabilized 
genes were generally enriched for ribosomal 
protein and ATP-binding factors, respectively, 
and mitochondrion genes were enriched in 
both groups (fig. S8A). 

We then investigated what factors might un- 
derlie such a differential deadenylation. Several 
key factors involved in decapping and deadenyl- 
ation, such as BTG4, CNOT6L/7/8, and PAN2/3, 
but not CNOT6, were expressed more highly in 
mouse than in human (fig. S8B), although their 
true translation levels remained unknown be- 
cause of the lack of spike-in calibration in the 
Ribo-lite data. The lower numbers of CPEs, es- 
pecially papCPEs, in the 3'UTRs are associated 
with decreased translation and increased RNA 
instability and RNA deadenylation in mouse 
oocytes upon meiotic resumption (J8). Con- 
sistently, genes belonging to the OET down- 
regulated class generally showed overall low 
CPE counts and short 3’UTRs both in mouse 
and human (Fig. 1E), but mouse genes showed 
even lower CPE counts, CPE densities, and shorter 
3'UTRs (Fig. 1G). The destabilized genes in hu- 
man (9.1%) also showed low CPE and papCPE 
counts and short 3‘UTRs compared with sta- 
ble genes, similar to those in mouse (Fig. 1G). 
These data suggest that for OET down-regulated 
genes, mouse tends to have fewer CPEs and 
shorter 3‘UTRs than human, accompanied by 
more-deadenylated and less-stable RNAs in 
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MII oocytes. A similar trend was observed for 
oocyte-specific genes (fig. S6C). 

Human OET down-regulated genes were 
reactivated for translation before or during 
ZGA (Fig. 1D). For some genes, the resumed 
translation occurred before major ZGA, sug- 
gesting translation reactivation of existing 
mRNAs in anticipation of ZGA (fig. S8C, “pre- 
ZGA up”). We found that these genes were 
enriched for transcription regulators. Such 
pre-ZGA translation up-regulation did not sim- 
ply arise from the minor ZGA, because the 
majority of the corresponding transcripts (81.7% 
for the stable group and 53.5% for the desta- 
bilized group) were insensitive to treatment with 
the transcription inhibitor o-amanitin (24) 
(fig. S8D). Furthermore, stable mRNAs were 
more likely to be up-regulated in pre-ZGA em- 
bryos than destabilized mRNAs in both human 
(54.4 versus 28.5%, respectively) and mouse 
(48.8 versus 37.5%, respectively) (fig. SSE), sug- 
gesting that their stability in oocytes may 
benefit their potential functions in early de- 
velopment and ZGA. Such translation up- 
regulation is relative to other genes in the 
genome because of the lack of the spike-in 
calibrated absolute measurement of transla- 
tion by Ribo-seq for low-input samples (78). 


CPE-mediated regulatory codes underline 
the divergence of human and mouse translatome 
during OET 


Of human-mouse homologous genes, 49.5% 
showed species-specific translation dynamics 
(Fig. 2A). For example, the component for Poly- 
comb repressive complex 2, EED, belonged 
to the OET up-regulated class in mouse but to 
the embryonic class in human (fig. S9A). Its 
presence in mouse but not in human pre-ZGA 
embryos was consistent with the propagation 
of oocyte H3K27me3 to early embryos as non- 
canonical imprints in mouse, but not human, 
embryos (34-36). Other divergent gene pairs 
included those involved in protein degradation, 
histone methyltransferases, and demethylases 
(fig. SQA). We also confirmed such species-specific 
translation dynamics using two additional 
mouse translatome profiling datasets (fig. S9, 
B and C). 

We then investigated whether CPEs may 
underpin species-specific translation regu- 
lation. We identified 366 genes belonging to 
the human OET up-regulated class and the 
mouse OET down-regulated class. These genes 
tended to have higher CPE density and more 
papCPEs in human than in mouse (Fig. 2B). 
Conversely, genes that fell into human OET 
down-regulated and mouse OET up-regulated 
classes (NV = 271) had lower CPE density and 
fewer papCPEs in human than in mouse (Fig. 
2B). We then chose individual genes to assess 
the translational activities of 3'UTRs using green 
fluorescent protein (GFP) reporters. hDROSHA 
and mDrosha are human OET up-regulated 
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Fig. 2. Divergence of transla- 
tomes between human and 
mouse oocytes and early 
embryos. (A) Alluvial diagram of 
human-mouse homologous genes. 
(B) Violin plots showing CPE 
density (count per nucleotide) and 
papCPE number for human OET 
up-regulated/mouse OET down- 
egulated gene pairs, human 

OET down-regulated/mouse OET 
up-regulated gene pairs, and all 
homologous genes in human 
(blue) and mouse (red). (C) Line 
plots showing hDROSHA and 
mDrosha RPF and mRNA dynam- 
ics (mean with range). The CPE, 
PAS elements and their distances 
in hMDROSHA/mDrosha 3'UTRs 

are shown below. (D) Normalized 
GFP intensity of hDROSHA/ mDrosha 
3'UTR reporter in mouse oocytes. 
(E) Line plots showing hNUP62 
and mNup62 RPF and mRNA dy- 
namics (mean with range). The CPE, 
PAS elements, and their distances 
in hNUP62/mNup62 3'UTRs are 
shown below. The negative distance 
indicates that the two elements 
overlap. (F) Normalized GFP 
intensity of hHNUP62/mNup62 3' 
UTR reporter in mouse oocytes. 
(G) Schematic of the CPE, PAS 
elements, and their distances for 
WT and mutant hNUP62 or 
mNup62 3'UTR. (H) Normalized 
GFP intensity of WT and mutant 
hNUP62/mNup62 3'UTR reporters 
in mouse oocytes. P values 
(unpaired two-sample t test) for 
(B) to (H): *P < 0.05; **P < 0.01; 
***P < 0.001; ****P < 0.0001; 
ns, not significant. 
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and mouse OET down-regulated genes. Con- 
sistently, hDROSHA, but not mDrosha, con- 
tained papCPEs in its 3’UTR (Fig. 2C). This is 
consistent with DROSHA and its processing 
of target microRNA being dispensable for 
mouse oocyte development but possibly being 
functional in human oocytes (37-39). Of note, 
hDROSHA was already translated in FGOs, and 
its up-regulation of translation upon meiotic 
resumption was also accompanied by an in- 
crease in MRNA levels, likely caused by en- 
hanced polyadenylation, leading to insignificant 
TE changes (fig. S1I0A). Nevertheless, when 
introducing 3'UTR-GFP reporters to FGOs 
followed by in vitro maturation to MII oocytes 
(fig. S1IOC), hDROSHA-reporter, but not mDrosha- 
reporter, was highly translated in mouse oocytes 
(Fig. 2D and fig. SIOD). Conversely, hNUP62 and 
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mNup62 belonged to the human OET down- 
regulated and mouse OET up-regulated class, 
as confirmed by their TE dynamics (fig. SIOB). 
Accordingly, three papCPEs are present in 
mouse Nup62 3’'UTR but not in its human 
counterpart (Fig. 2E). NUP62 encodes a central 
channel nucleoporin component and also func- 
tions in transcription and chromatin organiza- 
tion (40). When introducing a GFP reporter to 
mouse FGOs, followed by in vitro maturation 
toward the MII oocytes, the reporter with the 
mNUP62 3'UTR (mNUP62-reporter), but not 
the hNUP62 3’UTR (hNUP62-reporter), was 
highly translated in MII oocytes, suggesting 
again that the 3’UTRs explain the species- 
specific translation (Fig. 2F and fig. S10E). 
Moreover, the hNUP62-reporter became highly 
translated in mouse MII oocytes when it added 
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papCPEs from the mNup62 3’UTR. Converse- 
ly, deleting papCPEs from mNwp62 3'UTR im- 
paired its translation activation (Fig. 2, G and 
H, and fig. S1OE). Therefore, these data support 
the notion that distinct CPE usages underlie the 
differential translation between human and 
mouse. 

Finally, we also identified genes that were 
translated in one species but silenced through- 
out OET in the other (fig. SI1A). These genes 
were enriched in homeobox TFs and immu- 
nity. We also analyzed genes that were present 
and translated in one species but absent in the 
other (fig. S11B), among which a large number 
(N = 117 in human and N = 21 in mouse) were 
zinc finger proteins. KRAB zinc finger proteins 
are known to repress transposable elements 
and are highly divergent among species (41). 
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Another notable family is the PRD-like TF 
family, many members of which are present 
in human but not in mouse (42, 43). 


Pre-ZGA translation is essential for human ZGA 


We then investigated how translation activity 
regulated embryonic development and ZGA. 
Consistent with the results in mouse (18, 44-46), 
inhibition of translation by cycloheximide (CHX) 
in human 3 pronuclei (8PN) embryos (dis- 
carded during clinical application) starting 
from the 1C and 4C stage led to developmen- 
tal arrest at the 1C to 4C stage and the 5C to 9C 
stage, respectively (Fig. 3A and fig. S12A, right). 


Fig. 3. Pre-ZGA translation is 
essential for human ZGA. 

(A) Schematic of the CHX treat- 
ment experiments in human early f 
embryos. The red and green 
curves represent maternal and 
zygotic transcripts, respectively. 
(B) Percentages of persisting 
maternal genes (mRNA FPKM > 
50% of FGO, top), activated minor 
ZGA genes (mRNA FPKM > 20% 
of 4C, middle), and major ZGA 
genes (MRNA FPKM > 20% of 8C, 
bottom) for the reference data 

(no treatment) (green), the DMSO 
group (blue), and the CHX group 
(red) on day 5 are shown. Some 
minor ZGA and major ZGA genes 
have nonzero FPKM in oocytes. 
(C) Pie chart (top) showing the Cc 
proportions of a-amanitin sensitive 

and -insensitive genes in the MIl-4C 
translational activated genes, 
based on the reported human 8C 
RNA-seq data (24). Boxplots show 
the corresponding mRNA levels 
(bottom). (D) Top 250 TFs based 
on the RPF at the 4C or 8C stage, 
and top 250 TF motifs enriched 
at all distal accessible regions 
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We cultured the treated embryos to day 5 (with 
the time of in vitro fertilization as day 0) and 
subjected them to RNA-seq. We observed se- 
verely impaired ZGA and maternal RNA clear- 
ance in embryos with CHX treatment from the 
1C stage (Fig. 3B and fig. S12A, left). Although 
these defects were alleviated when the embryos 
were treated with CHX from the 4C stage, a 
substantial portion (24.2%) of major ZGA genes 
still failed to be properly activated (Fig. 3B and 
fig. S12A, left). Minor ZGA in CHX-treated 4C 
embryos largely recovered, consistent with 
these genes being transcribed earlier in em- 


bryos. Their expression became even higher 
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than that of control on day 5 (Fig. 3B and fig. 
S12A, left), suggesting defects in timely silenc- 
ing or decay of their transcripts. Thus, pre-ZGA 
translation is required for both human ZGA 
and maternal RNA clearance. Such translation 
activation from MII oocyte to 4C embryos main- 
ly comes from maternal transcripts rather than 
from minor ZGA, because most of them were 
insensitive to a-amanitin treatment (24) (Fig. 3C). 


PRD-like homeobox TFs are highly translated 
around ZGA 


To identify specific ZGA regulators, we searched 
for TFs that were highly translated before and 
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around major ZGA. The top three TFs trans- 
lated at the 4C stage were OTX2, LEUTX, and 
TPRXL (Fig. 3D), all belonging to the PRD-like 
homeobox TF family (42, 43, 47), which have a 
PRD class homeodomain but lack the PAIRED 
domain (43, 48). At the 8C stage, two addi- 
tional PRD-like homeobox TFs, TPRXI and 
TPRX2, became highly translated, consistent 
with their reported mRNA levels (42, 47, 49). 
There were at least 10 PRD-like homeobox 
TFs highly translated around ZGA (Fig. 3E). 
They can be classified into three groups on the 
basis of the timing of their translation. Ma- 
ternal TFs included OTX2 and TPRXL, which 
started to be translated upon oocyte meiotic 
resumption and began to decline after major 
ZGA (Fig. 3E). Consistently, OTX2 and TPRXL 
both contain a papCPE in their 3’UTRs (Fig. 
3E, bottom). Minor ZGA TFs included LEUTX, 
DUXA, DUXB, TPRX1, TPRX2, and CPHXL, 
because both their mRNA and their RPF levels 
started to increase at the 4C stage and culmi- 
nated at the 8C stage. DUXA and DUXB, as 
DUX family members, share similar DNA- 
binding motifs with DUX4 (50). Finally, major 
ZGA TFs, including ARGFX and DPRX, were 
activated at the 8C stage (Fig. 3E). Overexpres- 
sion studies in cell lines suggested a role for 
PRD-like homeobox TFs in regulating a subset 
of ZGA genes (47, 49, 51), although the exact 
roles in vivo remain largely unknown. They 
arose through gene duplication from an an- 
cestor gene, CRX, followed by asymmetric se- 
quence evolution (47). Moreover, the putative 
enhancers [distal accessible regions based on 
the assay for transposase-accessible chromatin 
sequencing (ATAC-seq) (24)] in human 8C em- 
bryos near major ZGA genes are enriched for 
TAATCC, a PRD-like TF-binding motif (42, 52) 
[represented by OTX2, CRX, GSC, and PITX1 
(table S4) in the HOMER motif database (53)] 
(Fig. 3D). These data are consistent with previous 
studies proposing that PRD-like homeobox TFs 
are involved in human ZGA (42, 43, 47, 49). 


TPRX factors regulate human preimplantation 
development 


We then investigated whether PRD-like homeo- 
box TFs play a role in human ZGA and early 
development. We prioritized the maternal TFs 
OTX2 and TPRXL and the minor ZGA TFs 
LEUTX (most translated at 4C to 8C) and TPRX1/2 
(related to TPRXL), because they are more 
likely to function upstream of ZGA. We attempted 
to knock down these TFs in 3PN human em- 
bryos. Although they exhibit a lower develop- 
mental rate toward blastocysts (54), 3PN embryos 
have similar accessible chromatin and histone 
mark landscapes (24, 35) and ZGA timing (24) 
(also see below) as 2PN embryos. Because we 
failed to efficiently deplete OTX2, and knock- 
ing down LEUTX had minor effects on ZGA and 
embryonic development (fig. S12B), we mainly 
focused on TPRX1/2/L. 
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TPRXL was suggested to arise from the reverse 
transcription product of TPRX7 with further 
sequence acquisition (43) (Fig. 4A). TPRXL was 
once annotated as a noncoding gene with a 
predicted open reading frame (ORF) (55). 
However, Ribo-lite signals in human embryos 
were enriched on a shorter ORF (Fig. 4B), con- 
sistent with a separate study (43). To confirm 
its presence, we generated an antibody against 
TPRXL and confirmed its specificity (fig. S12C). 
Immunostaining showed that TPRXL was trans- 
lated in oocytes and became strongly enriched 
in the nucleus in 2C to 8C embryos until the 
blastocyst stage (Fig. 4C). The small interfering 
RNA (siRNA) (morpholino was also added for 
TPRXL) against target genes or negative con- 
trols was injected into human 1C embryos in 
parallel (Fig. 4D). RNA-seq analysis confirmed 
highly efficient KD for all factors (mRNAs 
dropped to 3.3 to 25.8% of control on average) 
(Fig. 4E). We also validated the depletion of 
TPRX1 and TPRXL by immunofluorescence 
(TPRX2 antibody is not available) (fig. S12, D 
and E). On day 3 (the 8C stage), development 
was slightly or moderately delayed for TPRX1 
KD embryos, 7PRX2 KD embryos, TPRX1/2 
double KD embryos, and TPRXL KD embryos 
(Fig. 4F and fig. S13, A and B). Simultaneous 
KD of TPRX1/2/L [TPRX triple KD (TKD)] led 
to a severe developmental delay on day 3 (Fig. 
4F and fig. S13B). Although 3PN embryos ex- 
hibited asynchronized development, such de- 
lay was reproduced in four batches (fig. S13C) 
and persisted to day 4 (morula stage) (Fig. 4F 
and fig. S13B) (two batches). We did not assess 
development beyond day 4 because of the low 
blastocyst rate even for the control 3PN em- 
bryos. Our data suggest that TPRXL and TPRX1/2 
together regulate human early development. 


TPRX factors are required for human ZGA 


Single-embryo RNA-seq analysis showed that 
the global transcriptome of TPRX TKD embryos 
on day 3 was altered, because it fell between the 
4C and 8C stages, lagging behind the control 
embryos along the developmental trajectory in a 
principal components analysis (Fig. 4G). By 
contrast, only small effects of global transcrip- 
tome were observed when knocking down 
TPRX1, TPRX2, or LEUTX alone, and moderate 
effects were observed when knocking down 
TPRXL or TPRX1/2 (Fig. 4G and figs. S12B 
and S14, A and B). Detailed analysis showed 
that ZGA genes were predominantly down- 
regulated in TPRX TKD embryos, with ~31% 
of 8C-specifically activated genes and 36% of 
morula-specifically activated genes down- 
regulated at days 3 and 4, respectively (Fig. 5A). 
TPRX1/2 double KD and TPRXL KD also led to 
activation failure of ~15 to 21% of major ZGA 
genes, whereas KD of TPRX1 or TPRX2 alone 
only down-regulated ~2% of major ZGA genes, 
suggesting functional redundancy among TPRX 
members (Fig. 5B and fig. S14, A and B). For the 
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subsequent analyses, we mainly focused on the 
TPRX TKD group. Down-regulated ZGA genes 
(both minor and major) included ZSCAN4, 
DUXB, DUXA, NANOGNB, DPPA4, GATA6, 
DPRX, ARGFX, DPRX, RBP7, and KLF5 (Fig. 5C 
and fig. S14B). DPPA4 was shown to function 
in activating 2C-like genes and developmental 
genes in mouse (56, 57). GATA6 plays a critical 
role in regulating early lineage specification (58). 
Conversely, maternal transcripts were slightly 
up-regulated (fig. S14B), consistent with human 
ZGA playing a critical role in maternal RNA 
decay (31). These data suggest that TPRXL and 
TPRX1/2 both contribute to ZGA and may ac- 
tivate downstream TFs to initiate the transcrip- 
tion cascade. 

In TPRX TKD embryos, the down-regulated, 
but not up-regulated, genes were enriched for 
the PRD-like binding motifs in their nearest 
distal open chromatin regions (putative enhan- 
cers) (Fig. 5D and fig. S14C), as determined 
based on the ATAC-seq dataset (24). Major 
ZGA genes that failed to be activated contain 
more PRD-like TF-binding motifs, especially 
in the nearest distal open chromatin regions, 
than those that were not affected (Fig. 5, A, 
right, compare red and blue bars, and E). Con- 
versely, ZGA genes with PRD-like TF motifs 
presented in nearby distal open chromatin 
were more likely to be down-regulated com- 
pared with those without motifs (Fig. 5F). Such 
features were also evident in TPRXL and TPRX1/2 
KD embryos, and to a much lesser extent in 
TPRX1 and TPRX2 single KD embryos (Fig. 5F). 
As a control, the PRD-like motif enrichment was 
not evident for either affected or unaffected 
morula-specifically activated genes in TPRX 
TKD embryos (Fig. 5A, right), suggesting that 
the morula program defects might be indirect- 
ly caused by the aberrant 8C program. Further- 
more, among genes down-regulated at 8C, 57% 
(124 of 217) were still not properly activated on 
day 4: (<50% of the expression level of both day 3 
and day 4 control embryos), with some showing 
little expression as exemplified by NANOGNB, 
GATA6, ARGEX, DPRX, DPPA4, etc. (Fig. 5C). As 
controls, LEUTX (a minor ZGA gene) and CTCF 
[a major ZGA gene (59)] were activated normal- 
ly. Furthermore, although the global transcrip- 
tome had clearly progressed from day 3 to 
day 4 toward morula for control embryos, 
TPRX TKD embryos appeared to be stuck near 
the 8C stage (Fig. 4G). In fact, even though two 
morula-like embryos appeared in TPRX TKD 
on day 4, their transcriptomes more resembled 
the 8C stage (Fig. 4G, arrows). Therefore, genes 
with the TPRX motif in nearby regions were 
preferentially affected in TPRX TKD embryos, 
suggesting that the ZGA defects were unlikely 
to be due to a general transcription defect or 
developmental delay. 

To identify the possible TPRX-binding tar- 
get genes, we overexpressed Flag-tagged TPRX 
factors in H9 hESCs (fig. SI5A). TPRX7 or TPRX2 
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overexpression up-regulates 679 and 597 genes 
[with a fold change =1.5 (adjusted P < 0.05) or a 
fold change =3]. These up-regulated genes 
were enriched for PRD-like motif in their as- 
sociated distal open chromatin in human early 
embryos compared with down-regulated genes 
and random genes (fig. SI5B). We observed 
strong binding of TPRX1 and TPRX2 in hESCs, 
but not in enhanced GFP (EGFP)-overexpressing 
control cells, using small-scale Tn5-associated 
chromatin cleavage with sequencing (Stacc- 
seq) (60) (Fig. 6A and fig. S15C). However, we 
failed to detect chromatin binding of TPRXL 
in hESCs under the same conditions. Both the 
TPRX1- and TPRX2-binding sites were strongly 
enriched for the expected PRD-like TF motifs 
(Fig. 6B and table S4). The binding of TPRX1 
and TPRX2 correlated with each other genome 
wide, suggesting similar targeting preferences 
(fig. S15D). Genes down-regulated upon TPRX 
TKD in human embryos were preferentially 
bound by TPRX1/2 in hESCs compared with 
those that were up-regulated or not affected 
(Fig. 6C). TPRX1 and TPRX2 binding was 
found near the promoters of DPPA3, ZSCAN4, 
KLF5, and DUXB (Fig. 6A). In sum, TPRXs 
preferentially regulate ZGA genes harboring 
their binding motifs. 


Ectopic expression of TPRXs activates key 
human ZGA marker genes 


RNA-seq analysis in 7PRX-overexpressing hESCs 
revealed that genes activated by TPRX1 (NV = 679) 
and TPRX2 (N = 597) showed significant over- 
lap (N = 324, P < 2.2 x 10") (Fig. 6D and fig. 
S16, A and B), again suggesting that TPRX1 and 
TPRX2 may function redundantly. Among these 
genes activated by both proteins, a small but 
significant number of genes (NV = 24, P = 3.0 x 
10°) were minor or major ZGA genes such as 
DUXB, HESX1, KLF5, RBP7, ZSCAN4, etc. (Fig. 6, 
D and E). Half of these ZGA genes were also 
down-regulated in TPRX TKD embryos (N = 12, 
P= 4.5 x 10°’) (including RBP7, KLF5, and 
ZSCAN4). DPPA3, a marker for naive hESCs 
(61) and a highly expressed maternal gene that 
may drive DNA hypomethylation in embryos 
(62, 63), was strongly bound and activated by 
TPRXI1 and TPRX2 (Fig. 6, A and E). As a con- 
trol, CTCF was neither bound nor activated by 
TPRX1 or TPRX2 in hESCs (Fig. 6A). DUX4 
was also not activated (Fig. 6E). TPRX2 could 
also activate TPRX7 but not vice versa (Fig. 6E). 
Weak but evident binding of both TPRX1 and 
TPRX2 was found at the promoter and puta- 
tive enhancers of TPRX1 (fig. S16C, shaded). 
TPRXL barely activated genes in primed hESCs, 
and overexpressing 7PRXL in naive hESCs 
resulted in increased but still limited gene 
activation (NV = 23), which might have been 
caused by the lack of necessary cofactors in 
hESCs (Fig. 4A and fig. S16B). In sum, these 
data indicate that ectopic TPRXs can acti- 
vate a subset of ZGA genes in hESCs. 
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Fig. 4. TPRX factors are required for human embryonic development. (A) Schematic of protein sequence 
alignment of TPRXs based on the European Molecular Biology Laboratory—European Bioinformatics Institute multiple 


sequence alignment tool. Regions in red, yellow, and blue indicate high, medium, and low similarity, respectively. 
Slashed regions show high similarity between TPRX1 and TPRX2. (B) The University of California Santa Cruz (UCSC 
browser views of RPF and mRNA for TPRXL at the 4C stage. Two predicted ORFs are shown below. Dark blue shows 
homeobox domain. Green and red bars mark the start and stop codons, respectively. (©) TPRXL immunostaining 

in FGO; MIl oocytes; 2C, 4C, 8C, and 3PN embryos; and blastocysts (BL). (D) Schematic of gene KD experiments. 
3PN zygotes were injected by siRNA (for LEUTX/TPRX1/2/L) and morpholino (for TPRXL) or control siRNA and 
morpholino. RNA-seq samples were collected on day 3 (D3) or day 4 (D4). (E) Bar charts showing mRNA levels of 
TPRX1/2/L in control and KD embryos. The error bar denotes the SEM. (F) Developmental rates of TPRX1 KD (N = 10), 
TPRX2 KD (N = 13), TPRX1/2 KD (N = 12), TPRXL KD (N = 11), and TPRX1/2/L TKD embryos collected on D3 (N = 26) 
or D4 (N = 12) after microinjection. TPRX1 KD and TPRX2 KD shared the same control (N = 10), and TPRX1/2 KD 
shared the same control (N = 15) with LEUTX KD. TPRXL KD and TPRX1/2/L TKD shared the same control (N = 34) 
on D3. TPRX1/2/L TKD had its own control (N = 14) on D4. (G) PCA analysis of the transcriptome of 1C/2C/4C/8C/ 
morula/ICM (without injection), KD and control embryos on D3 or D4. Arrested embryos with ZGA completely failed, which 
indicates abnormal development of low-quality embryos, were removed in transcriptome analysis. Two morulae appeared 
in TPRX1/2/L TKD embryos, yet their transcriptomes were still similar to those of 8C embryos (arrows). 
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Fig. 5. TPRX factors are A 
required for human ZGA. 

(A) Heatmaps showing the aver- 
age mRNA levels of 4C to ICM 
stage-specifically activated ZGA 
genes in TPRX1/2/L TKD and 
control embryos on D3 and D4. 
PRD-like motif numbers at the 8C 
proximal or distal open chromatin 
region near each gene promoter 
are also shown. Genes failed to 
be activated and normally activated 
are labeled by the red and blue bars, 
respectively. MO, morula. FC, fold 
change. (B) Percentages of 8C 
activated major ZGA genes that 
failed to be activated in KD em- 
bryos on D3. (C) mRNA levels 
(single embryo) of example genes 
significantly down-regulated upon Cc 
TPRX1/2/L TKD in control or TKD 
embryos. Genes not affected are 
shown as controls (LEUTX and 
CTCF). (D) Motif enrichment in the 
8C nearby distal open regions 
based on 8C ATAC-seq (24)] 


(E) Violin plots showing the num- 
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nearby proximal or distal open D 
chromatin regions for genes that 
failed to be activated or were 
normally activated in TPRX1/2/L 
TKD embryos on D3. (F) Violin 
plots showing the mRNA fold 
changes (log2 ratio) for major ZGA 
genes with (red) or without (blue) 
a PRD-like TF motif at nearby 
putative enhancers in KD embryos 
on D3 and D4. P values (unpaired 
two-sample t test) for (C), (E), 
and (F): *P < 0.05; **P < 0.01; 
***P < 0.001; ****P < 0.0001; 
ns, not significant. 
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Discussion 

Nearly half of human-mouse homologous genes 
showed distinct translation patterns during 
OET, highlighting species-specific regulation 
and cautioning against direct extrapolation 
of data from model organisms to human. Such 
differences likely reflect the evolutionary adap- 
tation to species-specific needs. Host defensive 
responses to the transposon activity appear 
to be a major source of the differences be- 
tween species, which is in agreement with the 
notion that the domestication of transposons 
varies among different species, and diverse 
ZNFs evolve in each species during “arms races” 
with transposons (4/, 64). Mouse, but not hu- 
man, exhibits widespread maternal RNA de- 
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stabilization in MII oocytes, which correlates 
with the more-extensive deadenylation in mouse 
(32, 33). Given that these genes generally con- 
tain fewer CPEs and shorter 3/UTRs in mouse 
than in human, we propose that such early 
destabilization in mouse MII oocytes may be 
intrinsically coded in anticipation of an un- 
usually early ZGA compared with other mam- 
mals. A substantial fraction of these genes 
undergo retranslation around ZGA in both 
human and mouse embryos, underscoring a 
potentially underappreciated role for recy- 
cling these transcripts for embryonic develop- 
ment. CPE configuration (especially papCPEs) 
appears to contribute to the differential trans- 
lation between mouse and human oocytes. 
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These data emphasize a critical role of 3'UTR 
diversity in regulating species-specific RNA 
translation and stability. Such variability may 
stem from gene rearrangements, local nucleo- 
tide changes, transposition of insertion se- 
quences, or alternative polyadenylation (65). 

PRD-like TFs have been implicated in hu- 
man ZGA and preimplantation development 
based on expression timing, phylogenetic anal- 
ysis, and overexpression experiments in hESCs 
(42, 43, 47, 49). TPRX1 was recently recognized 
as a marker of 8C-like cells in hESCs (66, 67), 
and regulates transcription during the 8C-like 
cell conversion (67). Moreover, TPRXs do not ap- 
pear to regulate DUX4 in embryos and hESCs. 
Although ectopically expressed DUX4 was shown 
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respectively. The maternal factor TPRXL and minor ZGA factors TPRX1/2 regulate human ZGA and early embryonic development. The solid arrow indicates the 
possible direct target genes that were bound and activated by TPRXs in hESCs and down-regulated in TPRX KD embryos, including DUXB, ZSCAN4 (minor ZGA genes), 
KLF5 (major ZGA genes), and DPPA3. The dotted arrows indicate potential target genes that were down-regulated in TPRX1/2/L TKD embryos but not found to 
be activated or bound by ectopic TPRX1/2 in hESCs, including NANOGNB and DUXA (minor ZGA genes) and ARGFX, DPRX, GATA6, DPPA4, and RBP7 (major ZGA 
genes). Ectopic TPRX2 could bind and activate TPRX1 in hESCs. 


to increase the percentage of TPRX1-positive 
cells in the hESC population (66), it does not 
bind directly at TPRX7 (68). We postulate that 
different TFs may work in parallel to activate 
genes with corresponding motifs. Future stud- 
ies are warranted to understand the molecular 
mechanisms underlying their actions and in- 
terplay during ZGA. We envision that our study 
will pave the way for understanding the mech- 
anisms underlying the cascade of transcription 
at the beginning of life, which will benefit hu- 
man infertility research. 


Zou et al., Science 378, eabo7923 (2022) 


Methods summary 

A detailed materials and methods section is 
provided in the supplementary materials. In 
brief, human studies were approved by the ethics 
boards of participating institutions. Human im- 
mature oocytes from in vitro fertilization or 
intracytoplasmic sperm injection treatments 
were clinically discarded and donated by 20- 
to 38-year-old women after signing informed 
consent. In vitro maturation of oocytes was 
conducted following protocols described pre- 
viously (69). The oocyte and embryo vitrifica- 
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tions were performed as described previously 
(70). High-quality embryos were selected for this 
study according to morphological criteria (77). 

R2-lite integrated Ribo-lite (78) and Smart- 
seq2 (19), as described previously. Briefly, oocytes 
and embryos were lysed in 20 ul of ice-cold Ribo- 
lite lysis buffer. Next, 2 pl of the lysed samples 
was subjected to Smart-seq2, and the remain- 
ing sample was subjected to Ribo-lite profiling. 
mRNA-seq data and Ribo-lite data were ana- 
lyzed separately, as described previously with 
several modifications (78). 
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For the 3'UTR reporter assay, the 3’UTR se- 
quences for DROSHA/Drosha and NUP62/ 
Nup62 were cloned from the human or mouse 
oocyte cDNA library. Reporter plasmid con- 
struction, mutation, and mRNA preparation 
were conducted as described previously with 
several modifications (78). 

CHX treatment experiments were performed 
on human 3PN zygotes. Embryos were cul- 
tured to day 5 in the presence of dimethyl 
sulfoxide (DMSO) or CHX. The developmental 
phenotype was recorded with microscopy, and 
each embryo was subjected to Smart-seq2 li- 
brary preparation. To investigate the function 
of LEUTX and TPRX1/2/L in human ZGA, the 
siRNAs and morpholinos targeting these genes 
were microinjected into 3PN zygotes. The in- 
jected embryos with normal morphology were 
harvested on days 3 and 4 after fertilization for 
Smart-seq2 library preparation. 

Primed hESCs were maintained on Matrigel- 
coated plates in TeSR-E8 medium (STEMCELL 
Technologies). Naive-state hESCs were gener- 
ated as described previously (72) with some 
modifications. For the construction of vectors 
containing the human 7PRX7, TPRX2, and TPRXL 
genes, the coding sequences were synthesized or 
cloned from human 4C and 8C embryo cDNAs 
and inserted into the PiggyBac vector or the 
FUW lentivirus vector for doxycycline-inducible 
expression. Stacc-seq was used for Flag-tagged 
TPRX1/TPRX2/EGFP in hESCs with the wash 
steps, and the data were analyzed as described 
previously (60). 
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INTRODUCTION: Investment in Africa over the 
past year with regard to severe acute respira- 
tory syndrome coronavirus 2 (SARS-CoV-2) 
sequencing has led to a massive increase in 
the number of sequences, which, to date, ex- 
ceeds 100,000 sequences generated to track 
the pandemic on the continent. These se- 
quences have profoundly affected how public 
health officials in Africa have navigated the 
COVID-19 pandemic. 


RATIONALE: We demonstrate how the first 
100,000 SARS-CoV-2 sequences from Africa 


A Local sequencing facilities 


C Number of African SARS-CoV-2 genomes 


0 


have helped monitor the epidemic on the conti- 
nent, how genomic surveillance expanded 
over the course of the pandemic, and how 
we adapted our sequencing methods to deal 
with an evolving virus. Finally, we also ex- 
amine how viral lineages have spread across 
the continent in a phylogeographic frame- 
work to gain insights into the underlying 
temporal and spatial transmission dynam- 
ics for several variants of concern (VOCs). 


RESULTS: Our results indicate that the number 
of countries in Africa that can sequence the 
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Expanse of SARS-CoV-2 sequencing capacity in Africa. (A) African countries (shaded in gray) and 
institutions (red circles) with on-site sequencing facilities that are capable of producing SARS-CoV-2 whole 
genomes locally. (B) The number of SARS-CoV-2 genomes produced per country and the proportion of 
those genomes that were produced locally, regionally within Africa, or abroad. (C) Decreased turnaround time 
of sequencing output in Africa to an almost real-time release of genomic data. 
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virus within their own borders is growing and 
that this is coupled with a shorter turnaround 
time from the time of sampling to sequence 
submission. Ongoing evolution necessitated 
the continual updating of primer sets, and, as 
a result, eight primer sets were designed in tan- 
dem with viral evolution and used to ensure 
effective sequencing of the virus. The pandemic 
unfolded through multiple waves of infection 
that were each driven by distinct genetic lineages, 
with B.1-like ancestral strains associated with 
the first pandemic wave of infections in 2020. 
Successive waves on the continent were fueled 
by different VOCs, with Alpha and Beta cocir- 
culating in distinct spatial patterns during the 
second wave and Delta and Omicron affecting 
the whole continent during the third and fourth 
waves, respectively. Phylogeographic reconstruc- 
tion points toward distinct differences in viral 
importation and exportation patterns associ- 
ated with the Alpha, Beta, Delta, and Omicron 
variants and subvariants, when considering 
both Africa versus the rest of the world and viral 
dissemination within the continent. Our epide- 
miological and phylogenetic inferences there- 
fore underscore the heterogeneous nature of 
the pandemic on the continent and highlight 
key insights and challenges, for instance, rec- 
ognizing the limitations of low testing pro- 
portions. We also highlight the early warning 
capacity that genomic surveillance in Africa 
has had for the rest of the world with the de- 
tection of new lineages and variants, the most 
recent being the characterization of various 
Omicron subvariants. 


CONCLUSION: Sustained investment for diag- 
nostics and genomic surveillance in Africa is 
needed as the virus continues to evolve. This 
is important not only to help combat SARS- 
CoV-2 on the continent but also because it can 
be used as a platform to help address the many 
emerging and reemerging infectious disease 
threats in Africa. In particular, capacity build- 
ing for local sequencing within countries or 
within the continent should be prioritized 
because this is generally associated with 
shorter turnaround times, providing the most 
benefit to local public health authorities 
tasked with pandemic response and mitiga- 
tion and allowing for the fastest reaction to 
localized outbreaks. These investments are 
crucial for pandemic preparedness and re- 
sponse and will serve the health of the con- 
tinent well into the 21st century. 
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Investment in severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) sequencing in Africa 
over the past year has led to a major increase in the number of sequences that have been generated 
and used to track the pandemic on the continent, a number that now exceeds 100,000 genomes. 

Our results show an increase in the number of African countries that are able to sequence domestically and 
highlight that local sequencing enables faster turnaround times and more-regular routine surveillance. 
Despite limitations of low testing proportions, findings from this genomic surveillance study underscore the 
heterogeneous nature of the pandemic and illuminate the distinct dispersal dynamics of variants of concern— 
particularly Alpha, Beta, Delta, and Omicron—on the continent. Sustained investment for diagnostics 
and genomic surveillance in Africa is needed as the virus continues to evolve while the continent faces 
many emerging and reemerging infectious disease threats. These investments are crucial for pandemic 
preparedness and response and will serve the health of the continent well into the 21st century. 


hat originally started as a small cluster 
of pneumonia cases in Wuhan, China, 
more than 2 years ago (7) quickly turned 
into a global pandemic. COVID-19 is the 
clinical manifestation of severe acute 
respiratory syndrome coronavirus 2 (SARS-CoV-2) 
infection, and by March 2022, there had been 
more than 437 million reported cases and more 
than 5.9 million reported deaths (2). Although 
Africa accounts for the lowest number of re- 
ported cases and deaths thus far, with ~11.3 mil- 
lion reported cases and 245,000 reported deaths 
as of February 2022, the continent has played 
an important role in shaping the scientific re- 
sponse to the pandemic with the implementation 
of genomic surveillance and the identification of 
two of the five variants of concern (VOCS) (3, #). 
Since it emerged in 2019, SARS-CoV-2 has 
continued to evolve and adapt (5). This has led 
to the emergence of several viral lineages that 
carry mutations that either confer some viral 
adaptive advantages that increase transmis- 
sion and infection (6, 7) or counter the effect 
of neutralizing antibodies from vaccination 
(8) or previous infections (9-11). The World 
Health Organization (WHO) classifies certain 
viral lineages as VOCs or variants of interest 
(VOIs) based on the potential impact they may 
have on the pandemic, with VOCs regarded as 
the highest risk. To date, five VOCs have been 
classified by the WHO; of these, two were first 
detected on the African continent (Beta and 
Omicron) (3, 4, 12) and two (Alpha and Delta) 
(12, 13) have spread extensively on the conti- 
nent in successive waves. The remaining VOC, 
Gamma (/4), originated in Brazil and had a lim- 
ited influence in Africa, with only four recorded 
sequenced cases. 
For genomic surveillance to be useful for 
public health responses, sampling for sequenc- 
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ing needs to be both spatially and temporally 
representative. In the case of SARS-CoV-2 in 
Africa, this means extending the geographic 
coverage of sequencing capacity to capture the 
dynamic genomic epidemiology in as many 
locations as possible. In a meta-analysis of the 
first 10,000 SARS-CoV-2 sequences generated 
in 2020 from Africa (75), several blind spots were 
identified with regard to genomic surveillance 
on the continent. Since then, much investment 
has been devoted to building capacity for ge- 
nomic surveillance in Africa, coordinated mostly 
by the Africa Centers for Disease Control (Africa 
CDC) and the regional office of the WHO in 
Africa (or WHO AFRO) but also provided by 
several national and international partners, 
resulting in an additional 90,000 sequences 
shared over the past year (April 2021 to March 
2022). This makes the sequencing effort for 
SARS-CoV-2 a phenomenal milestone. In com- 
parison, only 12,000 whole-genome influenza 
sequences (J6) and only ~3700 whole-genome 
HIV sequences (77) from Africa have been shared 
publicly, even though HIV has plagued the con- 
tinent for decades. 

Here, we describe how the first 100,000 SARS- 
CoV-2 sequences from Africa have helped de- 
scribe the pandemic on the continent, how this 
genomic surveillance in Africa has expanded, 
and how we adapted our sequencing meth- 
ods to deal with an evolving virus. We also 
highlight the impact that genomic sequencing 
in Africa has had on the global public health 
response, particularly through the identifica- 
tion and early analysis of new variants. Finally, 
we also describe here how the Delta and Omicron 
variants have spread across the continent and 
how their transmission dynamics were dis- 
tinct from the Alpha and Beta variants that 
preceded them. 
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Results 

Epidemic waves driven by variant dynamics 

and geography 

Scaling up sequencing in Africa has provided a 
wealth of information on how the pandemic 
unfolded on the continent. The epidemic has 
largely been spatially heterogeneous across 
Africa, but most countries have experienced 
multiple waves of infection (18-29), with sub- 
stantial local and regional diversity in the first 
wave and to a lesser extent in the second wave, 
followed by successive sweeps of the continent 
with Delta and Omicron (Fig. 1A). In all re- 
gions of the continent, different lineages and 
VOIs evolved and cocirculated with VOCs and, 
in some cases, contributed considerably to epi- 
demic waves. 

In North Africa (Fig. 1B and fig. SIA), B.1 
lineages and Alpha dominated in the first and 
second waves of the pandemic and were re- 
placed by Delta and Omicron in the third and 
fourth waves, respectively. Interestingly, the 
C.36 and C.36.3 sublineages dominated the 
epidemic in Egypt (~40% of reported infections) 
before July 2021 when they were replaced by 
Delta (30). Similarly, in Tunisia, the first and 
second waves were associated with the B.1.160 
lineage and were replaced by Delta during the 
country’s third wave of infections. In southern 
Africa (Fig. 1C and fig. SIC), we see a similar 
pandemic profile, with B.1 dominating the 
first wave; however, instead of Alpha, Beta was 
responsible for the second wave, followed by 
Delta and Omicron. Another lineage that was 
flagged for close monitoring in the region was 
C.1.2 because of its mutational profile and pre- 
dicted capacity for immune escape (37). However, 
the C.1.2 lineage did not cause many infections 
in the region because it was circulating at a time 
when Delta was dominant. In West Africa (Fig. 
1D and fig. SIB), the B.1.525 lineage caused a 
large proportion of infections in the second 
and third waves, where it shared the pandemic 
landscape with the Alpha variant. As with other 
regions on the continent, these variants were 
later replaced by the Delta and then the Omicron 
VOCs in successive waves. In Central Africa 
(Fig. 1E and fig. S1D), the B.1.620 lineage caused 
most of the infections between January and 
June 2021 (32) before systematically being 
replaced by Delta and then Omicron. Lastly, 
in East Africa (Fig. IF and fig. SIE), the A.23.1 
lineage dominated the second wave of infec- 
tions in Uganda (33) and much of East Africa. 
In all of these regions, minor lineages such as 
B.1.525, C.36, and A.23.1 were eventually re- 
placed by VOCs that emerged in later waves. 

Finally, we directly compared the official 
recorded cases in Africa with the ongoing SARS- 
CoV-2 genomic surveillance data (GISAID date 
of access: 31 March 2022) for a crude estimation 
of the variants’ contributions to cases. We ob- 
serve that Delta was responsible for an epi- 
demic wave between May and October 2021 


1 of 15 


RESEARCH | RESEARCH ARTICLE 


> 


Africa Total 


Reported Daily Cases 
per Million 


10" 


Others 
Omicron 
Eta 
Delta 
Beta 
Alpha 


‘Apr-2020 
Jul-2020 


Oct-2020 


& & 
a 8 
mY 
a 
< 


sg 
Sampling Dates 


Jul+2021 
Oct-2021 
Jan-2022 
‘Apr-2022 


Genomes Sampled Monthly © 


50 O 500 @ 5000 


@ Alpha O Beta @ Delta @ Eta @ Omicron © Others 


oO 


Ghana Nigeria Senegal 


AY pds pa 


Others 


Reported Daily Cases 
per Million 


HQ arGeo- CuCo- 

Omicron @) <6 a 

Eta @e- e- - - 

(ay 

Delta iG wi (> ~- 

. I = 

Beta ome —2 “ 

Apna @ @- ~~ 
SSorrerNN SOOOrrErANN SOOrrErA 
RRSNNNSNN SRSNSASNN SARNASAN 
sssssssss Sssssssss Ssssssss 
Tee Peo he age ee na haa Wea ae 
SBSSEESR5G SSUFE5SRFG GSUFESBE 
LF0O8cFO8$e CFOS KF08$e CFO8e70$ 

‘Sampling Dates 
° 5 © 50 C) 500 


Fig. 1. Epidemiological progression of the COVID-19 pandemic on the 
African continent. (A) Total reported new case counts per million inhabitants 
in Africa (data source: Our World in Data; log-transformed) along with 

the distribution of VOCs, the Eta VOI, and other lineages through time (the 
size of each circle is proportional to the number of genomes sampled per 
month for each category). (B to F) Breakdown of reported new cases 


(Fig. 1A) and had the greatest impact on the 
continent, with almost 34.2% of overall in- 
fections in Africa possibly attributed to it. 
Beta was responsible for an epidemic wave at 
the end of 2020 and beginning of 2021 (Fig. 
1A), with 13.3% of infections overall attributed 
to it. Notably, Alpha, despite being predomi- 
nant in other parts of the world at the begin- 
ning of 2021, had only minimal importance in 
Africa, accounting for just 4.3% of infections. 
At the time of writing, the Omicron VOC had 
contributed to 21.6% of the overall number of 
sequenced infections. At this time, the Omicron 
wave was still unfolding globally and in Africa 
with the expansion of several sublineages (34), 
such that its full impact is yet to be determined. 
However, because of increased population im- 
munity (35) from SARS-CoV-2 infection and 
vaccination (fig. S2), the impact of Omicron on 
mortality has been less in comparison to the 
other VOCs, as can be observed by the relatively 
low death rate in South Africa during the 
Omicron wave (36). The findings from mapping 
epidemiological numbers onto genomic sur- 
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veillance data are reliable as far as the pro- 
portional scaling of genomic sampling across 
Africa with the size and timing of epidemic 
waves [fig. S3; model estimate (b) = 0.011, 
standard error (SE) = 0.001, p < 2 x 107'°]. 
This comes with the obvious caveats that 
testing and reporting practices have varied 
widely across the continent along with ge- 
nomic surveillance volumes throughout the 
pandemic. Countries in Africa with reported 
data have tested in proportions from as little as 
0.1 daily tests per million population to more 
than 1000 tests per million (fig. S4). Some coun- 
tries have consistently tested at high proportions, 
for example, South Africa, Botswana, Morocco, 
and Tunisia. Incidentally, these countries have 
also generally reported more cases per million 
population, providing an indication that recorded 
low incidences in other parts of the continent 
have been underestimates due to low testing 
rates. However, even for these countries, epi- 
demic numbers are certainly underrepresented 
and underdetected, given that in several time 
frames, test positivity rates were still on the 
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per million (data source: Our World in Data; log-transformed) and monthly 
sampling of VOCs, regional variant, or lineage of interest and other 
lineages for three selected countries for North, southern, West, Central, 
and East Africa, respectively. For each region, a different variant or lineage 
of interest is shown, relevant to that region (C.36, C.1.2, Eta, B.1.620, and 


higher end, approaching or exceeding 20% (fig. 
S4), and as concluded by seroprevalence sur- 
veys and estimates of true infection burdens in 
Africa (37, 38). Findings of attributing case 
numbers of variants must therefore be inter- 
preted in the context of this limitation but can 
nevertheless provide a qualitative overview of 
the spatial and temporal dynamics of VOCs in 
relation to epidemic progression in Africa. 
The African regional (table S1) and country- 
specific (table S2) NextStrain builds also clearly 
support the changing nature of the pandemic 
over time. From these builds, we observe a strong 
association of B.1-like viruses circulating on the 
continent during the first wave. These “ances- 
tral” lineages were subsequently replaced by the 
Alpha and Beta variants, which dominated the 
pandemic landscape during the second wave 
and were later replaced by the Delta and Omicron 
variants during the third and fourth waves. 


Optimizing surveillance coverage in Africa 


By mapping and comparing the locations of spec- 
imen sampling laboratories to the sequencing 
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laboratories, a number of aspects regarding 
the expansion of genomic surveillance on the 
continent became clear. First, even though 
several countries in Africa started sequencing 
SARS-CoV-2 in the first months of the pan- 
demic, local sequencing capacity was initially 
limited. However, local sequencing capabil- 


A B 


Local Sequencing Facilities 


ities slowly expanded over time, particularly 
after the emergence of VOCs (Fig. 2A). The fact 
that almost half of all SARS-CoV-2 sequenc- 
ing in Africa was performed using the Oxford 
Nanopore Technology (ONT), which is rela- 
tively low-cost compared with other sequenc- 
ing technologies and better adapted to modest 
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laboratory infrastructures, illustrates one com- 
ponent of how this rapid scale-up of local 
sequencing was achieved (fig. S5). Yet, to rely 
only on local sequencing would have thwarted 
the continent’s chance at a reliable genomic 
surveillance program. At the time of writing, 
52 of 55 countries in Africa had SARS-CoV-2 
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Fig. 2. Sequencing strategies and outputs in Africa. (A) Geographical 
representation of all countries (shaded in gray) and institutions (red dots) in 
Africa with their own on-site sequencing facilities. The inset graph shows 

the number of countries in Africa that are able to carry out sequencing locally 
over time. (B) Key regional sequencing hubs and networks in Africa showing 
countries (shaded in bright colors) and institutions (red dots) that have 
sequenced for other countries (shaded in corresponding light colors and linking 
curves) on the continent. ACEGID, African Centre of Excellence for Genomics 
of Infectious Diseases; CERI, Centre for Epidemic Response and Innovation; 
KEMRI-WT, Kenya Medical Research Institute-Wellcome Trust; KRISP, KwaZulu- 
Natal Research Innovation and Sequencing Platform; ILRI, International 
Livestock Research Institute; INRB, Institut National de Recherche Biomédicale; 
IPD, Institut Pasteur de Dakar; MRC/UVRI, Medical Research Council/Uganda 
Virus Research Institute; MRCG, Medical Research Council Unit-The Gambia; 
NICD, National Institute for Communicable Diseases; NMIMR, Noguchi Memorial 
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Institute for Medical Research. (C) Geographical representation of the total 
number of SARS-CoV-2 whole genomes produced over the course of the 
pandemic in each country, as well as the proportion of those sequences that 
were produced locally, regionally, or abroad. (D) Correlation of the proportion of 
COVID-19 positive cases that have been sequenced and the corresponding 
number of epidemiological weeks since the start of the pandemic that are 
represented with genomes for each African country. The color of each circle 
represents the number of cases and its size the number of genomes. 

(E) Comparison of sequencing turnaround times (lag times from sample 
collection to sequence submission) for the three strategies of sequencing in 
Africa, showing a significant difference in the means (****p < 0.0001). The box 
and whisker plot denotes the lower quartile, the median and upper quartiles 
(box), the minimum and maximum values (whiskers), and the outliers 

(black dots). (F) Pearson correlations of the total number of sequencing 
laboratories per country against key sequencing outputs. 


8 of 15 


RESEARCH | RESEARCH ARTICLE 


genomes deposited in GISAID; however, there 
were still 16 countries with no reported local 
sequencing capacity (Fig. 2A) and undoubtedly 
many with limited capacity to meet demand 
during pandemic waves. 

To tackle this, three centers of excellence 
and various regional sequencing hubs were 
established to maximize the resources avail- 
able in a few countries to assist in genomic 
surveillance across the continent. This se- 
quencing is done either as the sole source of 
viral genomes for those countries (e.g., Angola, 
South Sudan, and Namibia) or concurrently 
with local efforts to increase capacity during 
resurgences (Fig. 2B). Sequencing is further 
supplemented by a number of countries that 
use facilities outside of Africa. Ultimately, a 
mix of strategies from local sequencing, col- 
laborative resource sharing among African 
countries, and sequencing with academic col- 
laborators outside the continent helped close 
surveillance blind spots (Fig. 2C). Countries in 
sub-Saharan Africa, particularly in southern 
and East Africa, most benefited from the re- 
gional sequencing networks, whereas coun- 
tries in West and North Africa often partnered 
with collaborators outside of Africa. 

The success of pathogen genomic surveil- 
lance programs relies on how representative 
it is of the epidemic under investigation. For 
SARS-CoV-2, this is often measured in terms 
of the percentage of reported cases sequenced 
and the regularity of sampling. African coun- 
tries were positioned across a range of differ- 
ent combinations of overall proportion and 
frequency of genomic sampling (Fig. 2D). Al- 
though the ultimate goal would be to optimize 
both of these parameters, a lower proportion 
of sampling can also be useful if the frequency 
of sampling is maintained at as high a level as 
possible. For instance, South Africa and Nigeria, 
which have both sequenced ~1% of cases over- 
all, can be considered to have successful ge- 
nomic surveillance programs based on the fact 
that sampling is representative over time and 
has enabled the timely detection of variants 
(Beta, Eta, Omicron). 

Additionally, for genomic surveillance to be 
most useful for rapid public health response 
during a pandemic, sequencing would ideally 
be done in real time or in a framework as close 
as possible to that. We show a general trend of 
decreasing sequencing turnaround time in 
Africa (fig. S6), particularly from a mean of 
182 days between October and December 2020 
to a mean of 50 days over the same period a 
year later, although this does come with several 
caveats. First, we measure sequencing turn- 
around time in the most accessible manner, 
which is by comparing the date of sampling of 
a specimen to the date its sequence was de- 
posited in GISAID. Generally, the genomic data 
potentially informs the public health response 
more rapidly than reflected here, particularly 
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when it comes to local outbreak investigations 
or variant detection. This analysis is also con- 
founded by various factors such as country- 
to-country variation in these trends (fig. S7), 
delays in data sharing, and potential retro- 
spective sequencing, particularly by countries 
that joined sequencing efforts at later stages 
of the pandemic. The most critical caveat is 
the fact that sequencing from the most recently 
collected samples (e.g., over the past 6 months) 
may still be ongoing. The shortening duration 
between sampling and genomic data sharing 
is nevertheless a positive takeaway, given that 
these data also feed into continental and global 
genomic monitoring networks. Overall, the con- 
tinental average delay from specimen collec- 
tion to sequencing submission is 87 days, with 
10 countries having an average turnaround 
time of less than 60 days and Botswana of less 
than 30 days (fig. S8). 

Most importantly, in the context of optimiz- 
ing genomic surveillance, we found that the 
route taken to sequencing affects the speed of 
data generation. Of the three frameworks we 
investigated, local sequencing has statistically 
faster sequencing turnaround times (median 
of 51 days), followed by sequencing within re- 
gional sequencing networks in Africa (median 
of 93 days) and finally outsourced sequencing 
to countries outside Africa (median of 113 days) 
(Fig. 2E). This finding strongly supports the 
investments in local genomic surveillance to 
generate timely and regular data for local and 
regional decision-making. Finally, we show 
that it is beneficial in several ways for coun- 
tries to undertake genomic surveillance through 
several sequencing laboratories rather than 
by centralizing efforts. For instance, we esti- 
mate strong correlations between the numbers 
of sequencing laboratories per country and the 
total number of genomes produced by that 
country (Pearson correlation, 0.75), the total 
number of epiweeks for which sequencing data 
was produced (Pearson correlation, 0.81), and, 
importantly, sequencing turnaround time 
(Pearson correlation, —0.37) (Fig. 2F). 

With the increase in sequencing capacity on 
the continent, a decrease in the time taken to 
detect new variants was observed. For exam- 
ple, the Beta variant was identified in Decem- 
ber 2020 in South Africa (4), but sampling and 
molecular clock analyses suggest that the var- 
iant originated in September 2020. This 3-month 
lag in detection means that a new variant, like 
Beta, has ample time to spread over a large 
geographic region before its detection. How- 
ever, by the end of 2021, the time to detect a 
new variant was substantially improved. Phy- 
logenetic and molecular clock analyses suggest 
that the Omicron variant originated around 9 
October 2021 (95% highest posterior density: 
30 September to 20 October 2021), and the 
variant was described on 23 November 2021 
(3). Thus, Omicron was detected within ~5 weeks 
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from origin compared with the Beta variant 
(~16 weeks) and the Alpha variant, which was 
detected in the United Kingdom (~10 weeks). 
More importantly, the time from sequence dep- 
osition to the WHO declaring the new variant 
a VOC was substantially shortened to 72 hours 
for the Omicron variant. 

To interpret insights from the described ge- 
nomic surveillance in Africa, it is important 
to understand the context of epidemiological 
reporting and sampling strategies used for 
sequencing on the continent (table S3). Most 
countries provided daily reports of newly 
recorded cases, whereas a few provided weekly 
and monthly reports. For most countries, sur- 
veillance was mainly focused on the major 
cities, suggesting potential cryptic circulation 
in rural areas. We find that at the onset of the 
pandemic, surveillance was focused on identi- 
fication of imported cases from incoming trav- 
elers or local residents returning from various 
countries. As community transmissions began 
to emerge, the focus shifted toward regular sur- 
veillance and outbreak investigations. Together, 
these three strategies account for the vast ma- 
jority of samples generated on the continent 
and analyzed here. As the pandemic progressed 
and vaccines were made available, some coun- 
tries on the continent began to explore other 
sampling strategies such as reinfections, en- 
vironmental samples such as wastewater sam- 
ples, and vaccine breakthrough cases to gain 
new insights into the evolutionary dynamics 
of SARS-CoV-2. The utility of sequencing for 
viral evolution tracking and VOC detection in 
the way described above is obviously also de- 
pendent on sampling proportions, especially 
within sampling for regular surveillance. 

The speed of SARS-CoV-2 evolution has com- 
plicated sequencing efforts. Common methods 
of RNA sequencing include reverse transcrip- 
tion followed by double-stranded DNA ampli- 
fication using sequence-specific primer sets 
(39). Ongoing SARS-CoV-2 evolution has neces- 
sitated the continual evaluation and updating 
of these primer sets to ensure their sustained 
utility during genomic surveillance efforts. Here, 
we examined the current set of genomes to 
determine aspects of the sequencing process 
that might be improved in the future. Many of 
the primer sets that were used were designed 
using viral sequences from the start of the pan- 
demic and may require updating to keep pace 
with evolution. Indeed, the ARTIC primer sets 
are now in version 4.1 (40). The Entebbe primer 
set was designed mid-2020, well into the first 
year of the epidemic, and used an algorithm 
and design that accommodates evolution (47). 

The effects of viral evolution on sequencing 
patterns can be seen with low median unspec- 
ified nucleotide (N) values (a consequence of 
primer dropout or low coverage at that site) that 
were observed for the first 12 months of the 
epidemic, with an increase from October 2020 


4 of 15 


RESEARCH | RESEARCH ARTICLE 


(Fig. 3A). Additional challenges appear (as in- 
dicated by increasing median N values) as 
the virus further evolved into the Delta and 
Omicron lineages from January 2021 onward 
(Fig. 3A). By examining the role of sequenc- 
ing technology, it appears that the two major 
technologies used (Illumina and ONT) have 
similar gap profiles (as measured by mean N 
count per genome), whereas Ion Torrent, MGI, 
and Sanger show a reduced mean N count per 
genome (Fig. 3B). Likely factors for this pat- 
tern are the primers used in sequencing, with 
primer choice playing a key role in the quantity 
of gaps (Fig. 3C). The mean N count per ge- 
nome varied with viral lineage (Fig. 3D). There 
was a modest difference in mean N count per 
genome across the lineages. Lineages that re- 
turned no classification with Pangolin (“none”) 
showed the highest mean N count, suggesting 
that high mean N count per genome was prob- 
ably the basis for failed classification. The more 
recent lineages, Delta (e.g., AY.39, AY.75) and 
Omicron (BA.1.1, BA.2), also showed higher 
mean N count per genome, consistent with vi- 
rus evolution impairing primer function. This 
pattern is further explored in fig. S9, where the 
position of gaps shows an enrichment in the 
genome regions after position 19,000, with fre- 
quent gaps disrupting the spike coding region. 


Phylogenetic insights into the rise and spread 
of VOCs in Africa 


During the first wave of infections in 2020 in 
Africa, as was the case globally, most correspond- 
ing genomes were classified as PANGO B.1 (n = 
2456) or B.1.1 viruses (n = 1329). Toward the 
end of 2020, more-distinct viral lineages started 
to appear. Of these, the most important ones 
that affected the African continent are B.1.525 
(n = 797), B.1.1.318 (n = 398) (42), B.1.1.418 (n = 
395), A.23.1 (n = 358) (15, 29, 31, 33), C.1 (n = 
446) (29), C.1.2 (n = 300) (3D), C.36 (n = 305) 
(30, 43), B.1.1.54 (n = 287) (15, 29, 31, 33), 
B.1.416 (7 = 272), B.1.177 (n = 203), B.1.620 (n = 
138), and B.1.160 (m = 61) (32) (fig. S10, A and 
B). Our discrete state phylogeographic infer- 
ence from phylogenetic reconstruction of non- 
VOC African sequences and an equal number 
of external references revealed that African 
countries were primarily seeded by multiple 
introductions of viral lineages from abroad 
(mainly Europe) at the beginning of the pan- 
demic. The observed pattern of non-VOC viral 
lineage movement then consistently shifted 
toward more intercontinental exchanges (fig. 
S10C). Mapping out the spatial routes of dis- 
semination shows that various countries in all 
subregions of the continent acted as sources 
of these viral lineages at one point or another 
(fig. S10D). Although uneven testing rates and 
proportions of samples sequenced on the con- 
tinent may have influenced these inferences (dis- 
cussed later), the results presented here are in 
line with the fact that these most predominant 
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Fig. 3. Genome gap analysis. (A) The mean N count per genome by month of submission to GISAID. The 
time periods corresponding to the detection of important SARS-CoV-2 lineages are indicated at the 

top of the figure. (B) Illustration of the mean N count per genome stratified by sequencing technology. 
(C) The mean N count per genome stratified by the sequencing primers sets used. (D) Mean N count 
per genome by lineage. The mean N data were stratified by SARS-CoV-2 lineages to investigate the 
lineage-specific frequency of genome gaps, an indirect measure of primer mismatch. All lineages that were 
present at least 100 times in the genome data are presented. For (A) to (D), error bars indicate 95% 
confidence intervals. 
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Fig. 4. Inferred viral dissemination patterns of VOCs within Africa. (A) Genomic 
prevalence of VOCs Alpha, Beta, Delta, and Omicron in Africa over time. (B) Inferred 
viral exchange patterns to, from, and within the continent of Africa for the four 
VOCs (Omicron as BA.1 and BA.2) based on case-sensitive phylogeographic 
inference. Introductions and viral transitions within Africa are shown as solid lines, 
and exports from Africa are shown as dotted lines; the lines are colored by continent. 


non-VOC lineages in Africa, except B.1.177, 
emerged and circulated widely in different 
subregions (Fig. 1). 

Similar to the pandemic globally, VOCs be- 
came increasingly important in Africa toward 
the end of 2020. The Alpha, Beta, Delta, and 
Omicron variants demonstrate many similar- 
ities as well as differences in the way that they 
spread on the continent. For all these VOCs, 
we observe large regional monophyletic trans- 
mission clusters in each of their phylogenetic 
reconstructions in Africa (fig. S11). This sug- 
gests an important extent of continental dis- 
semination within Africa. Alpha and Beta were 
epidemiologically important in distinct re- 
gions of the continent, with Alpha primarily 
circulating in West Africa, North Africa, and 
most of Central Africa; Beta circulating in 
southern and most of East Africa; and both 
only substantially cocirculating in a few coun- 
tries such as Angola, Kenya, Comoros, Burundi, 
and Ghana (Fig. 1 and fig. S12). However, we 
may not have enough resolution in the geo- 
spatial data to know whether and to what ex- 
tent they were truly cocirculating throughout 
these countries or whether there were regional 
outbreaks of Alpha and Beta within these coun- 
tries. In Kenya, for example, Beta was detected 
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more frequently in coastal regions and Alpha 
more frequently inland (26, 44). By contrast, 
the Delta and Omicron variants sequentially 
dominated most infections on the entire con- 
tinent shortly after their emergence (Fig. 4A 
and fig. S12). 

The Alpha variant was first identified in 
December 2020 in the United Kingdom and 
has since spread globally. In Africa, Alpha was 
detected in 43 countries, with evidence of com- 
munity transmission based on phylogenetic 
clustering in many countries, including Ghana, 
Nigeria, Kenya, Gabon, and Angola (fig. S11). 
Discrete state maximum likelihood reconstruc- 
tion from a globally case-sensitive genomic sub- 
sampling inferred at least 80 introductions 
[95% confidence interval (CI): 78 to 82] into 
Africa, with the bulk of imports attributed to the 
United States (>47%) and the United Kingdom 
(>25%) (Fig. 4B). Only 1% of imports into any 
particular African country were attributed to 
another African nation. Phylogeographic re- 
construction enriched in African sequences 
revealed that of those, >85% of the intercon- 
tinental Alpha exchanges in Africa originated 
from West African countries (Fig. 4C). This 
occurred in spite of initial importations of the 
Alpha variant from Europe into all regions of 
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the continent (fig. S13B) but is in line with 
Alpha having dominated circulation mostly in 
West Africa (fig. S12). In countries where Alpha 
was introduced but did not grow and cause an 
expansion of cases, this can be explained by 
competition with the already established Beta 
variant, which simultaneously circulated. The 
characteristics of multiple introductions of 
Alpha into Africa and between African coun- 
tries is similar to the spread of Alpha that has 
been documented in the United Kingdom, 
Scotland, and Ireland (45-47). 

The second VOC, Beta, was identified in 
December 2020 in South Africa (4). However, 
sampling and molecular clock analyses suggest 
that the variant originated around September 
2020 (fig. S11). At the end of 2020 and be- 
ginning of 2021, Beta was driving a second 
wave of infection in South Africa and quickly 
spread to other countries within the region. 
The concurrent introductions and spread of 
Alpha and other variants (Eta, A.23.1) in other 
regions of the continent may have reduced 
the Beta variant’s initial growth, limiting its 
spread largely to southern Africa and, to a 
lesser extent, the East Africa region. Beta spread 
to at least 114 countries globally, including 
37 countries and territories in Africa. For this 
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variant, viral circulation and geographical ex- 
changes occurred predominantly within the 
continent. Indeed, phylogeographic recon- 
struction from a globally case-sensitive sam- 
pling revealed that of the 810 (95% CI: 803 to 
818) inferred introductions of the Beta var- 
iant into African countries, only 110 (95% CI: 
105 to 115; 13%) were attributed to sources 
outside the continent (fig. S13C), whereas 
more than half of the introductions were at- 
tributed to South Africa (63%) (Fig. 4C). This 
is in line with expectations because the variant 
originated in South Africa. Beyond southern 
Africa, most of the introductions back into the 
continent were attributed to France and other 
European Union countries into the French 
overseas territories, Mayotte and Reunion, and 
other Francophone African countries. Africa- 
focused phylogeographic analysis revealed a 
similar spatial pattern that showed southern 
countries as substantial sources of the variant, 
followed in small numbers by countries in 
East Africa (Fig. 4C). 

The fourth VOC observed was Delta (73), 
which rose to prominence in April 2021 in India, 
where it fueled an explosive second wave. Since 
its emergence, Delta has been detected in >170 
countries, including 37 African countries and 
territories (fig. S11). Our global case-sensitive 
subsampled analysis infers at least 100 (95% CI: 
93 to 106) introductions of the Delta variant 
into Africa, with the bulk attributed to India 
(~72%), mainland Europe (~8%), the United 
Kingdom (~5%), and the United States (~2.5%). 
Viral introductions of Delta also occurred from 
one African country to others in 7% of inferred 
introductions. From our Africa-focused phylo- 
geographic inferences, we infer that unlike 
Alpha and Beta, viral dissemination of Delta 
within Africa was not restricted to or domi- 
nated by any particular region but rather spread 
across the entire continent (Fig. 4C). After in- 
troductions from Asia in the middle of 2021, 
Delta rapidly replaced the other circulating 
variants (Fig. 4A). For example, in southern 
African countries, the Delta variant rapidly dis- 
placed Beta and, by June 2021, was circulating 
at very high (>90%) frequencies (48). 

The latest VOC, Omicron, was identified and 
characterized in November 2021 in southern 
Africa (3). At the time of writing, the variant 
had been detected and caused waves of infec- 
tions in >160 countries, including 39 African 
countries and two overseas territories (fig. S11). 
Because of the genetic distance between them 
and their sequential (rather than simultane- 
ous) epidemic expansion globally, phylogenies 
were reconstructed separately for Omicron 
BA.1 and BA.2. Our discrete ancestral-state 
reconstruction from a global case-sensitive 
sampling for Omicron BA.1 infers at least 55 
(95% CI: 47 to 62) viral exports of BA.1 out of 
various African countries, of which 31 (95% CI: 
25 to 36) were toward Europe and 8 (95% CI: 
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6 to 10) were toward North America (Fig. 4B). 
After explosive expansion of Omicron around 
the world, we inferred even more reintroduc- 
tions of the variant back into Africa, at least 69 
(95% CI: 60 to 78) from Europe and 102 (95% CI: 
92 to 112) from North America (Fig. 4B). From 
our Africa-focused phylogeographic reconstruc- 
tions, we determine that, as with Delta, routes 
of dissemination of this variant involved all 
regions of the continent spatially (Fig. 4C). Yet 
~75% of all BA.1 viral movement volume in 
Africa happened between southern African 
countries, likely because of rapid epidemic ex- 
pansion in the region soon after its detection 
(3). Omicron BA.2’s reach in Africa was lim- 
ited at the time of writing, with only 3260 se- 
quences from 19 countries attributed to BA.2 
on GISAID (date of access: 31 March 2022) (15% 
of all Omicron sequences from Africa). Our 
discrete ancestral-state reconstruction from 
a global case-sensitive sampling for Omicron 
BA.2 infers at least 68 (95% CI: 53 to 84) viral 
exports out of African countries, of which most 
were toward Europe (~88%) (Fig. 4B). We also 
infer at least 99 (95% CI: 87 to 109) separate 
introduction or reintroduction events of BA.2 
back into African countries, of which ~65% are 
from Europe and ~30% from Asia, primarily 
from India (Fig. 4B). This is consistent with 
India having experienced one of the earliest 
large BA.2 waves globally. In the context of 
global incidence of BA.2, this case-sensitive 
phylogeographic analysis revealed that only 
0.01% of viral movements of this lineage glob- 
ally happened from one African country to 
another. Our Africa-focused analysis inferred a 
similar pattern of BA.2 spatial diffusion within 
African to that of BA.1 (Fig. 4C). However, given 
that this accounted for such a small percent- 
age of global BA.2 movements, BA.2 diffusion 
from one African country to another is unlikely 
to have had a substantial impact on epidemi- 
ological expansion, compared with introduc- 
tions from Asia, Europe, or North America. 
Globally, dissemination of the SARS-CoV-2 
virus throughout the pandemic was intricately 
linked with human mobility patterns (49-53). 
To determine the validity of the VOC move- 
ment patterns that we infer into and within the 
Africa continent in this study, we compared viral 
import and export events to and from South 
Africa with travel to the country. In December 
2020, the United Kingdom accounted for the 
fifth-highest number of passengers entering 
South Africa, whereas other countries with the 
top-nine sources of travelers were all neighbor- 
ing countries in southern Africa (fig. S14A). Con- 
sidering that incidence of the Alpha variant 
was not meaningful in the region, this sup- 
ports our inference of the United Kingdom 
contributing 60% of Alpha introductions to 
South Africa (fig. SI5A). In March 2021, the 
United States, Germany, the United Kingdom, 
and India were among the top-12 sources of 
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travelers to South Africa after eight African 
countries (fig. S14B). During this time of Delta 
dissemination globally, we infer that ~90% of 
introductions of Delta into South Africa orig- 
inated in the United Kingdom, the United States, 
and India (fig. SI5B). At the end of 2021, most 
introductions or reintroductions of Omicron 
to the country came from the United Kingdom, 
the United States, or Botswana, corresponding 
to locations of both high Omicron incidence at 
the time and high numbers of passengers to 
South Africa (figs. S14C and S15C). These travel 
patterns also fit the findings that ~89, ~70, and 
~75% of Beta, Delta, and Omicron exports, re- 
spectively, from South Africa to other African 
countries were directed to locations in south- 
ern Africa (figs. S14, D and E, and S15, D and EF). 


Discussion, limitations, and conclusions 


By April 2020, a total of 20 African countries 
were able to sequence the virus within their 
own borders. This was largely made possible 
by other preexisting sequencing efforts on the 
continent that were focused on other human 
pathogens (e.g., HIV, tuberculosis, Ebola, and 
HIN1). However, these efforts were quickly 
limited by global supply chain issues, and, in 
many countries, sequencing efforts substan- 
tially slowed down or stopped toward the end of 
2020. To facilitate more sequencing on the con- 
tinent over the course of the past year (April 2021 
to March 2022), the Africa CDC and partners 
invested heavily to support genomic surveillance 
on the continent. This included the transfer 
of 24 new sequencing platforms (including 
Minlon, GridIon, MiSeq, and NextSeq), the dis- 
tribution of reagents and flow cells to support 
the sequencing of 100,000 positive samples, the 
training of >230 students and technicians in 
wet laboratory and bioinformatic techniques, 
and additional grants to support 10 regional 
sequencing hubs. This investment has started 
bearing fruit and should be intensified as the 
virus continues to evolve, requiring the adap- 
tation of methodologies locally on the continent 
to keep pace with the emergence of variants. The 
continued development of sequencing proto- 
cols in Africa is of crucial importance (41, 54, 55) 
given the number of variants and lineages 
that emerged in, and were introduced to, the 
continent. In North Africa, the SARS-CoV-2 
pandemic was caused by waves of infections 
that were similar to those seen in Europe (first 
wave attributed to B.1 descendants, second 
wave to Alpha, third wave to Delta, and fourth 
wave to Omicron); in southern Africa, the pat- 
tern was similar but with a Beta wave instead 
of an Alpha one. In East Africa, the pandemic 
was more complex, involving both Alpha and 
Beta as well as its own lineage A.23.1 before the 
arrival of Delta and Omicron. Central Africa 
experienced epidemic patterns that sometimes 
mirrored those of East Africa and other times 
those of southern Africa. In West Africa, Eta 


7 of 15 


RESEARCH | RESEARCH ARTICLE 


made a considerable contribution to both a 
second wave (together with Alpha) and a third 
wave (together with Delta). The factors that 
resulted in these regional differences are not 
clear but could be due to differences in human 
mobility, founder effects, competition between 
lineages, or the immunity induced by earlier 
waves in a region. 

Public health benefits of such broadly in- 
clusive genomic surveillance are manifold. The 
most prominent insight from this expanded 
genomic surveillance in Africa has been an 
early warning capacity for the world after the 
detection of new lineages and variants, most 
recently relevant in the detection of Omicron 
BA.1, BA.2, BA.3, BA.4, and BA.5 subvariants 
(3, 4, 34). Furthermore, the reporting of local 
SARS-CoV-2 sequences made the epidemic 
more immediate to the Ministries of Health 
from the reporting African countries. It be- 
came clear early on that the viral evolution is 
global and that the transmission of the virus 
is extremely rapid, which guided mitigation 
strategies. The generation and availability of 
local sequences also validated local diagnos- 
tics and allowed investigators to determine 
whether nucleic acid-based diagnostics that 
were in use could still detect local variants. 
The detection of SARS-CoV-2 in returning 
travelers and truck drivers indicated routes 
that the virus might be using to enter a coun- 
try and guided early efforts to slow virus entry 
and gain time to establish vaccination plans. 
Later, the difficulty of stopping the virus at 
borders combined with data showing that the 
variants were already in community circula- 
tion allowed public health officials to focus 
efforts and limited resources on vaccination 
rather than on border controls. The detection 
and reporting of the more-recent lineages with 
enhanced transmission (i.e., Omicron) and the 
ability to bypass existing immunity is impor- 
tant information and an early alert to public 
health officials globally that the epidemic is 
still proceeding. As the pandemic progresses 
in an evolving global context, we provide evi- 
dence that with each new variant, transmis- 
sion dynamics are changing and the use of 
sequencing with phylogenetics could poten- 
tially alter decisions of public health measures. 
For example, the demonstrated shift away from 
regional dynamics of Alpha and Beta toward 
more global patterns with Delta and Omicron 
can provide insights to public health officials 
as they anticipate epidemic developments lo- 
cally. With Omicron, it became clear that al- 
though the variant expanded first in Africa, 
the continent ultimately had a minimal role 
in global dissemination and that continental 
expansion beyond southern Africa was most 
influenced by external introductions, in con- 
trast to the Beta variant. All of these public 
health benefits to sequencing SARS-CoV-2 are 
primarily amplified, as we show in this study, 
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if the sequencing can be conducted locally 
within a country, which strongly supports the 
continued investment into pathogen sequenc- 
ing on the continent. 

Despite the recent successful expansion of 
genomics surveillance in Africa, additional work 
is necessary. Even with investments from the 
Africa CDC-Africa Pathogen Genomics Initiative 
and other investments, there are still 16 coun- 
tries with no sequencing capacity within their 
own borders. The only option for these coun- 
tries is to send samples to continental sequenc- 
ing hubs or to centers outside of the continent, 
which increases turnaround times and limits 
the utility of genomic surveillance for public 
health decision-making. Secondly, not all coun- 
tries are willing to share data openly in a timely 
fashion for fear of being subject to travel bans 
or restrictions that could bring substantial eco- 
nomic harm. Such hesitancy has obvious po- 
tential ramifications for the future of genomic 
surveillance on the continent. Furthermore, 
with the expansion of sequencing on the con- 
tinent, there is a growing need for more bio- 
informatics support and knowledge to allow 
investigators to analyze and report their data 
in a reasonable time frame that makes it use- 
ful for a public health response. It is also clear 
that the SARS-CoV-2 sequencing primers are 
not a static development and may require up- 
dating as the virus evolves. A number of research 
groups have been addressing the SARS-CoV-2 
sequencing primer questions. Issues of gaps in 
the genomes due to missing amplicons have 
been discussed (56, 57). The ARTIC primer set 
has gone through a number of revisions to ac- 
commodate virus evolution (39, 40). Additional 
longer amplicon methods have been published 
(58-60), including methods to use a subset of 
ARTIC primers (67). 

The patterns we describe here are of course 
limited to reported cases and apply to both the 
phylogeographic as well as the epidemiology 
inferences. As such, the results need to be in- 
terpreted with these limitations in mind. Our 
primary phylogeographic inference relied on a 
sampling strategy that considered all high- 
quality African sequences and an equal number 
of external references. Though this strategy has 
the advantage of placing all African sequences 
in a phylogenetic context, it introduces a bias 
when applied to discrete ancestral-state recon- 
struction because more internal nodes are in- 
ferred to be from Africa. To address this, we 
performed an even sampling of global cases, 
based on reported case counts through time, to 
compare against our oversampled inference. 
The even-sampling approach has the benefit 
that the discrete ancestral-state reconstruc- 
tion is not biased by uneven sampling. After 
comparing the two, there are obvious differ- 
ences, most notably that the number of inferred 
introductions into Africa is proportional to 
sampling proportions (fig. S16) because we no 
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longer consider all African sequences but rather 
just a small subset against a global sample. 
However, inferences from the two approaches 
correspond well with one another. For exam- 
ple, considering Alpha, we still observed that 
the vast majority of introductions into Africa 
originated from Western Europe. Patterns of 
dissemination within Africa are more robustly 
comparable between the two, for instance, that 
countries in West Africa were the biggest source 
of Alpha within the continent. High concor- 
dance between the two inference methods was 
also observed for other VOCs for dispersal routes 
within Africa, which gives us confidence in the 
inferred patterns we observe here. Although we 
represent an inference based on oversampling 
and case-sensitive sampling, it is, at present, not 
possible to explore how undersampling affects 
the phylogeographic reconstruction because of 
uneven testing rates. Additionally, the robust- 
ness of the phylogeographic inference can also 
be affected by the underlying methodology 
that is used. Broad consensus would favor the 
use of Bayesian methods for phylogeographic 
reconstruction, which is often considered to 
be the “gold standard” in the field. The main 
drawbacks of Bayesian methods are that they 
can only be applied to a relatively small num- 
ber of sequences at a time (<1000) and they 
are extremely computationally and time inten- 
sive. Given the explosion of sequence data over 
the past 2 years, the scientific community will 
have to adapt or put forth new analytical meth- 
ods to fully capitalize on the global sequencing 
efforts for SARS-CoV-2. 

Despite our best attempts to consider and 
minimize genomic sampling bias, the accuracy 
of the resulting phylogenetic inferences is lim- 
ited by the available epidemiological and ge- 
nomic data, leading to unaccounted biases in 
the estimates of viral movements. This includes 
limited testing and subsequent sequencing 
in many African countries. Although the per- 
centage of reported cases sequenced in African 
countries (0.01 to 10%, mean = 1.27%) is not far 
from global figures (0.01 to 16%, mean = 1.31%), 
testing rates and infection-to-detection ra- 
tios in Africa were some of the lowest globally 
(38, 62). Together with estimates of excess mor- 
tality being as much as 20-fold greater than the 
reported numbers in African countries (63), 
these are strong indications of undetected and 
underreported epidemic sizes in Africa, lead- 
ing to undersampling of genomic data (62) 
and thus underestimates of viral exchange 
inferences in our study. Some countries with 
no publicly available SARS-CoV-2 sequences 
are, by definition, completely missing in our 
inference. This in turn means that inferred 
routes of viral transmission within Africa could 
be missing important intermediate locations, 
although this is potentially true around the 
world. Nevertheless, we believe that the viral 
movement inferences that we discuss in this 
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study provide a likely qualitative description 
of the patterns of SARS-CoV-2 migration into, 
out of, and within Africa. 

Finally, we should also mention uneven se- 
quencing and reporting standards across the 
different laboratories on the continent—and 
globally, for that matter. Different groups use 
different measures for what constitutes a high- 
quality sequence (e.g., 70 versus 80% sequence 
coverage) or use different sequencing depth 
coverage. This lack of global standardization 
complicates the direct comparison of sequences 
that may have been submitted to GISIAD using 
different criteria, further biasing any inference. 
Given the sheer size of SARS-CoV-2 sequenc- 
ing, with ~10 million whole-genome sequences 
shared on the GISAID database (date of access: 
31 March 2022), there is an urgent need for 
global standards with regard to sequence qual- 
ity and associated metadata. 

Africa needs to continue expanding genomic 
sequencing technologies on the continent in 
conjunction with diagnostic capabilities. This 
holds true not just for SARS-CoV-2 but also for 
other emerging or reemerging pathogens on 
the continent. For example, in February 2022, 
the WHO announced the reemergence of wild 
polio in Africa, and sporadic influenza H1N1, 
measles, and Ebola outbreaks continue to occur 
on the continent. The Africa CDC has estimated 
that more than 100 pathogen outbreaks are 
reported across the continent every year. Be- 
yond the current pandemic, continued invest- 
ment in diagnostic and sequencing capacity for 
these pathogens could serve the public health 
of the continent well into the 21st century. 


Methods and methods 
Ethics statement 


This project relied on sequence data and as- 
sociated metadata that are publicly shared by 
the GISAID data repository and adhere to the 
terms and conditions laid out by GISAID (6). 
The African samples processed in this study 
were obtained anonymously from material ex- 
ceeding the routine diagnosis of SARS-CoV-2 
in African public and private health labora- 
tories. Individual institutional review board 
references or material transfer agreements 
(MTAs) for countries are as follows: Angola 
(MTA - CON8260); Botswana-genomic surveil- 
lance in Botswana was approved by the Health 
Research and Development Committee (pro- 
tocol HPDME 13/18/1); Egypt-surveillance in 
Egypt was approved by the Research Ethics 
Committee of the National Research Centre 
(Egypt) (protocol number 14 155, dated 22 March 
2020); Kenya-samples were collected under 
the Ministry of Health protocols as part of the 
national COVID-19 public health response, and 
the whole-genome sequencing study protocol 
was reviewed and approved by the Scientific 
and Ethics Review Committee (SERU) at Kenya 
Medical Research Institute (KEMRI), Nairobi, 
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Kenya (SERU protocol #4035); Nigeria (NHREC/ 
01/01/2007), Mali-study of the sequence of SARS- 
CoV-2 isolates in Mali, Letter of Ethical Commit- 
tee (NO-2020 /201/CE/FMPOS/FAPH of 09/17/ 
2020); Mozambique (MTA - CON7800); Malawi 
(MTA - CON8265); South Africa-the use of South 
African samples for sequencing and genomic 
surveillance was approved by University of 
KwaZulu-Natal Biomedical Research Ethics 
Committee (ref. BREC/00001510/2020), the Uni- 
versity of the Witwatersrand Human Research 
Ethics Committee (HREC) (ref. M180832), Stel- 
lenbosch University HREC (ref. N20/04/008_ 
COVID-19), the University of the Free State Re- 
search Ethics Committee (ref. UFS-HSD2020/ 
1860/2710), and the University of Cape Town 
HREC (ref. 383/2020); Tunisia-for sequences 
derived from sampling in Tunisia, all patients 
provided their informed consent to use their 
samples for sequencing of the viral genomes, 
and the ethical agreement was provided to the 
research project ADAGE (PRFCOVID19GP2) by 
the Committee of Protection of Persons (Tunisian 
Ministry of Health) under the reference CPP 
SUD N 0265/2020; Uganda-the use of samples 
and sequences from Uganda was approved by 
the Uganda Virus Research Institute, Research 
and Ethics Committee UVRI-REC Federalwide 
Assurance (FWA) no. 00001354, study refer- 
ence GC/127/20/04/771, and by the Uganda 
National Council for Science and Technology, 
reference number HS936ES; and Zimbabwe 
(MTA - CON8271). 


Epidemiological and genomic data dynamics 


We analyzed trends in daily numbers of cases 
of SARS-CoV-2 in Africa up to 31 March 2022 
from publicly released data provided by the Our 
World in Data repository for the continent of 
Africa (https://github.com/owid/covid-19-data/ 


tree/master/public/data) as a whole and for in- 
dividual countries (2). To provide a comparable 


view of epidemiological dynamics over time in 
various countries, the variable under primary 
consideration for Fig. 1 was “new cases per 
million (smoothed).” To calculate the genomic 
sampling proportion and frequency for each 
country for Fig. 2, the total number of recorded 
cases as of 31 March 2022 was considered, as 
well as the total length of time for which each 
country had recorded cases of SARS-CoV-2. 
Genomic metadata was downloaded for all 
African entries on GISAID for the same time 
period (date of access: 31 March 2022). From 
this, information extracted from all entries for 
this study included the date of sampling, coun- 
try of sampling, viral lineage and clade, orig- 
inating laboratory, sequencing laboratory, and 
date of submission to the GISAID database. 
The geographical locations of the originating 
and sequencing laboratories were manually 
curated. Sequences originating and sequenced 
in the same country were defined as locally se- 
quenced, irrespective of specific laboratory or 


7 October 2022 


finer location. Sequences originating in one 
African country and sequenced in another 
were defined as sequenced within regional 
sequencing networks. Sequences sequenced in 
a location not within Africa were labeled as 
sequenced outside Africa. Sequencing turn- 
around time was defined as the number of 
days that had elapsed from specimen collection 
to sequence submission to GISAID. Sequencing 
technology information for all African en- 
tries was also downloaded from GISAID on 
31 March 2022. 


Primer choice and sequencing outcomes 


All SARS-CoV-2 genomes from African coun- 
tries were retrieved from GISAID (6) for sub- 
mission dates from 1 December 2019 to 31 March 
2022, yielding 100,470 entries. Associated meta- 
data for the entries were also retrieved, includ- 
ing collection date, submission date, country, 
viral strain, and sequencing technology. Data 
on the primers used for the sequencing were 
requested from investigators and yielded primer 
data for 13,973 of the entries (~13%). The total 
N (bases with low sequence depth) per ge- 
nome were counted, the results of which were 
then used for genome quality analysis and vi- 
sualization. Gap locations in the genomes were 
mapped and visualized with respect to the 
original Wuhan strain (64). 


Phylogenetic investigation 


All African sequences on the GISAID sequence 
database (16) were downloaded on 31 March 
2022 (n = 100,470). Of these, Alpha accounted 
for 3851 sequences, Beta accounted for 14,548 
sequences, Delta accounted for 35,027 sequences, 
Omicron accounted for 21,708 sequences, and 
25,336 sequences were classified as non-VOCs. 
Before any phylogenetic inference, we performed 
some quality assessment on the sequences to 
exclude incomplete or problematic sequences 
as well as sequences lacking complete meta- 
data. Briefly, all African sequences were passed 
through the NextClade analysis pipeline (65) 
to identify and exclude (i) sequences missing 
>10% of the SARS-CoV-2 genome, (ii) sequences 
that deviate by >70 nucleotides from the Wuhan 
reference strain, (iii) sequences with >10 ambig- 
uous bases, (iv) clustered mutations, and (v) 
sequences flagged with private mutations by 
NextClade. Additionally, Omicron variants were 
screened for traces of viral recombination 
with RDP5.23 (66) using default settings and a 
p value of <0.05 as evidence of recombination. 
A large number of sequences were removed 
(n = 57,421), with incomplete sequences (<90% 
genome coverage) being the biggest contrib- 
utor. This produced a final African dataset of 
43,049 high-quality African sequences. Because 
of the sheer size of the dataset, we opted to per- 
form independent phylogenetic inferences 
on the main VOCs (Alpha, Beta, Delta, and 
Omicron BA.1 and BA.2) that have spread on the 
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African continent, as well as a separate inference 
for all non-VOC SARS-CoV-2 sequences. 

To evaluate the spread of the virus on the 
African continent, we aligned the African data- 
sets against a large number of globally repre- 
sentative sequences from around the world. 
Because of the oversampling of some variants 
or lineages, we performed a random down sam- 
pling while retaining the oldest two known 
variants from each country. Reference sequences 
were respectively aligned with their African 
counterparts independently with NextAlign 
(65). Each of the alignments was then used to 
infer maximum likelihood (ML) tree topologies 
in FastTree v 2.0 (67) using the general time 
reversible model of nucleotide substitution 
and a total of 100 bootstrap replicates (68). The 
resulting ML tree topologies were first inspected 
in TempEst (69) to identify any sequences that 
deviate more than 0.0001 from the residual 
mean. After the removal of potential outliers 
in R with the ape package (70), the resulting 
ML trees were then transformed into time- 
calibrated phylogenies in TreeTime (77) by ap- 
plying a rate of 8 x 10~ substitutions per site per 
year (72) to transform the branches into units 
of calendar time. Time-calibrated trees were 
then visualized, along with associated metadata, 
in R using ggtree (73) and other packages. 

We performed a basic viral dispersal analy- 
sis for each of the VOCs (excluding Gamma) as 
well as for the non-VOC dataset. Briefly, a mi- 
gration model was fitted to each of the time- 
calibrated tree topologies in TreeTime, mapping 
the country location of sampled sequences to 
the external tips of the trees. The mugration 
model of TreeTime also infers the most likely 
location for internal nodes in the trees. Using 
a custom python script, we could then count 
the number of state changes by iterating over 
each phylogeny from the root to the external 
tips. We count state changes when an internal 
node transitions from one country to a different 
country in the resulting child node or tip(s). The 
timing of transition events is then recorded, 
which serves as the estimated import or export 
event. To infer some confidence around these 
estimates, we performed 10 replicates for each of 
the datasets by random selection from the 100 
bootstrap trees. Because of the high uncertainty 
in the inferred locations for deep internal nodes 
in the trees, we truncated state changes to the 
earliest date of sampling in each dataset. All data 
analytics were performed using custom python 
and R scripts, and the results were visualized 
using the ggplot libraries (74). Such phylogeo- 
graphic methods are always subject to uneven 
sampling through time (i.e., over the course of 
the pandemic) and through space (by sam- 
pling location). To address this, we have per- 
formed a case-sensitive analysis to investigate 
the effects of oversampling African locations 
on the inferred number of viral introductions. 
Furthermore, in a previous analysis (15), we 
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performed a sensitivity analysis to address some 
of these issues and found no substantial var- 
iations in estimates. 


Case-sensitive phylogeographic inference 


To address the potential oversampling of African 
sequences relative to global reference in the 
above-mentioned analyses, we performed an- 
other phylogeographic inference on subsamples 
based on global case counts to try to eliminate 
oversampling bias in our inference. To this 
end, we considered all high-quality sequences 
for each of the VOCs (Alpha, Beta, Delta, and 
Omicron BA.1 and BA.2) globally over the same 
sampling period (until 31 March 2022). We used 
subsampler (https://github.com/andersonbrito/ 
subsampler) to generate subsamples for each 
variant based on globally reported cases. In 
short, subsampler uses a case-count matrix 
of daily cases, along with the fasta sequences 
and GISAID associated metadata, to sample 
a user-defined number of sequences. For each 
VOC and for BA.1 and BA.2, we performed 
10 samplings using different number seeds to 
sample datasets of ~20,000. Once again, sam- 
pled sequences were screened for viral recom- 
bination as described above and sequences 
with signs of recombination were removed. Sub- 
sampler has the added advantage that it dis- 
regards poor quality sequences (e.g., <90% 
coverage) and sequences with missing meta- 
data (e.g., exact date of sampling). Each data- 
set was then subjected to the same analytical 
pipeline as mentioned above to infer the viral 
transitions between Africa and the rest of 
the world. 


Regional and country-specific NextStrain builds 


To investigate more-granular changes in line- 
age dynamics within a specific country or re- 
gion in Africa, we used the NextStrain pipeline 
(https://github.com/nextstrain/ncov) to gen- 
erate the regional and country-specific builds 
for African countries (75). First, all sequence 
data and metadata were retrieved from the 
GISAID sequence database and filtered for 
Africa based on the “region” tab for inclusion 
in regional and country-specific African builds. 
For country-specific builds, ~4000 sequences 
from a given country were randomly selected 
and analyzed against ~1000 randomly selected 
sequences from the Africa “nextregions” records 
that do not match the focal country of interest. 
For regional (e.g., West Africa) builds, ~4000 
sequences from the focal region were selected 
at random and analyzed against ~1000 ran- 
domly selected sequences from the Africa 
“nextregions” records that do not match the 
focal region of interest. The methodological 
pipeline for NextStrain is well documented 
and performs all analyses within one workflow, 
including filtering of sequences, alignment, 
tree inference, molecular clock, and ancestral- 
state reconstruction. For more information, 
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please visit https://docs.nextstrain.org/en/ 


latest/index.html. 


All regional and country-specific builds are 
regularly updated to keep track of the evolv- 
ing pandemic on the continent. All builds are 
publicly available under the links provided 
in tables S1 and S2 as well as on the NextStrain 
web page (https://nextstrain.org/sars-cov-2/ 


#datasets). 
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Electrochemical potential enables dormant spores 
to integrate environmental signals 
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Emmanuel A. Theodorakis”, Jordi Garcia-Ojalvo’, Giirol M. Siie 
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The dormant state of bacterial spores is generally thought to be devoid of biological activity. We show 
that despite continued dormancy, spores can integrate environmental signals over time through a 
preexisting electrochemical potential. Specifically, we studied thousands of individual Bacillus subtilis 
spores that remain dormant when exposed to transient nutrient pulses. Guided by a mathematical model 
of bacterial electrophysiology, we modulated the decision to exit dormancy by genetically and chemically 
targeting potassium ion flux. We confirmed that short nutrient pulses result in step-like changes in 
the electrochemical potential of persistent spores. During dormancy, spores thus gradually release their 
stored electrochemical potential to integrate extracellular information over time. These findings 

reveal a decision-making mechanism that operates in physiologically inactive cells. 


he formation of bacterial spores (sporula- 
tion) is a common and well-characterized 
survival strategy in many microbial species 
(/, 2). Spores are partially dehydrated cells 
enclosed by a protective coat that can sur- 
vive environmental extremes and remain dor- 
mant for years (3). They need to be robust to 
environmental fluctuations to avoid exiting their 
dormant state (germinating) prematurely. At 
the same time, spores need to germinate if they 
detect favorable conditions (4) (Fig. 1A). Ger- 
mination requires the rehydration of the spore, 
which is promoted by the release of calcium- 
dipicolinic acid (CaDPA) (5). Aside from deg- 
radation of RNA immediately after sporulation 
(6), dormant spores appear to have no measur- 
able metabolic or biological activity (7). There- 
fore, it remains unclear whether dormant spores 
possess any activity that could affect the choice 
of whether or not to germinate. We thus tested 
whether dormant Bacillus subtilis spores ex- 
perience any physiological changes in response 
to subtle environmental signals that do not trig- 
ger germination. Addressing these questions 
could reveal how spores reconcile their robust 
dormant state with the need to process extra- 
cellular information and make an informed 
decision on whether to continue or exit their 
dormancy. 
Spores can be pretreated with nutrients to 
promote germination (8-70). These findings 
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imply that spores can somehow integrate extra- 
cellular signals despite their dormancy and 
thereby alter their future likelihood of trigger- 
ing germination. Although there are no well- 
established mechanisms for dormant cells to 
integrate extracellular information, the ability 
of spores to modulate their future response 
suggests a conceptual similarity to a decision- 
making mechanism in neuroscience known as 
integrate-and-fire (77, 12). This mechanism de- 
scribes how neurons respond to small syn- 
aptic inputs before reaching the threshold that 
triggers an action potential (73). It is unclear 
whether dormant spores use a similar mecha- 
nism to process environmental inputs and mod- 
ulate their approach toward a threshold that 
triggers germination. 

Given the physiological inactivity of spores, 
we investigated a possible integration mecha- 
nism based on passive ion flux, which does not 
require cellular energy. Our findings indicate 
that physiologically inactive spores integrate 
environmental signals by modifying preexist- 
ing ion gradients that were established dur- 
ing sporulation. Dormant spores can thus use 
stored electrochemical potential energy to reg- 
ulate their cell-fate decision without requiring 
de novo adenosine triphosphate (ATP) synthe- 
sis. In this way, spores can alter their distance 
to the germination threshold depending on 
environmental inputs while still in the dormant 
state. This mechanism also reconciles the robust 
dormancy of spores with the ability to gradu- 
ally become sensitized to future environmen- 
tal signals. 


Results 
B. subtilis spores can remain dormant despite 
exposure to germinant pulses 


We confirmed that similar to laboratory strains, 
undomesticated B. subtilis spores can be pre- 


treated with short nutrient (germinant) pulses 
to increase the likelihood of germination (8). 
Specifically, we imaged spores (Fig. 1B) within 
a microfluidic device that allows single-cell 
monitoring and precise control over the com- 
ponents in the incubation medium (materials 
and methods). We optically tracked the switch 
in phase-contrast brightness that results from 
the rehydration of spores during germination 
(4, 14) (Fig. 1C and fig. S1, A and B). Using this 
experimental approach, we exposed thousands 
of spores to a single short germinant pulse 
[10 mM (z)-alanine for 3 min] and found that 
~95% of spores remained dormant (95.2% + 
1.9%, n = 2244) (Fig. 1D). We used the germinant 
t-alanine because it is a naturally occurring 
nutrient that triggers germination through 
designated receptors in bacterial spores (5). 
Spores that did not germinate upon stimula- 
tion remained dormant for at least the next 
20 hours of imaging (fig. SIC). Any spore that 
did germinate in response to the germinant 
pulse did so on average within 15 min (14.85 + 
1.07, n = 1831) (fig. SID). 

To quantify the integration capacity of spores, 
we applied a second germinant pulse, which was 
separated by 2 hours from the first pulse to 
ensure that germination in response to the first 
pulse of germinant had subsided. After the sec- 
ond pulse, approximately half of the remaining 
spores germinated (52.1% + 6.2%). The germi- 
nation propensity of spores was independent of 
their location within the microfluidic chamber 
(fig. S2). We defined the spore’s integration ca- 
pacity as the population-level change (differ- 
ence) in the germination probability in response 
to two consecutive germinant pulses (Fig. 1E). 
The germination probability increased by 47% + 
1% (mean + SD, n = 9 replicate populations) be- 
tween the first and second pulse. Spores thus 
become sensitized by the first exposure and 
appear to move closer toward a germination 
threshold. 


A mathematical model of the role of ion flux 
in responding to germinant pulses 


To explain how physiologically inactive spores 
could integrate information about germinant 
exposure over time, we explored an ion flux as 
a mechanism, as this process could occur pas- 
sively with preexisting ionic gradients estab- 
lished during sporulation. Other processes such 
as de novo gene expression and enzymatic ac- 
tivity typically require energy, which is highly 
limited in spores. In particular, we focused here 
on the flux of potassium because it is the most 
abundant intracellular ion in bacteria and has 
physiological roles in the stress response of 
B. subtilis (16-19). Furthermore, potassium 
ions have been proposed to stabilize the for- 
mation of bacterial spores (20). To investigate 
the possible role of potassium ion flux in dor- 
mant spores, we developed a mathematical 
model based on the Hodgkin-Huxley framework 
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(21) (Fig. 1F, Box 1, and supplementary text). 
Our model describes how potassium ion flux 
can drive a spore toward a fixed germination 
threshold through an integrate-and-fire mech- 
anism, without requiring any physiological 
activity. 

Our mathematical model assumes that po- 
tassium ions enter or leave a spore through 
passive transport through both selective potas- 
sium channels and nonspecific ion channels. 
The direction and rate of potassium flux across 
the spore membrane depend on the potassium 
ion concentration gradient, as well as on the 
membrane potential of the spore (16-19) (sup- 
plementary text). Spores contain high amounts 
of potassium (22, 23), which would result in ion 
efflux when channels are open. We assume 
that ion pumps are inactive during dormancy, 
given that they require ATP for transport, which 
is highly limited and not actively produced in 
spores (6). Furthermore, the model assumes 
that ion channels (both potassium specific 
and nonspecific) are closed until germinant is 
added. The channels open in the presence of 
germinant and close in its absence. Lastly, we 
assume that the initial potassium content of 
spores has some variability and that germi- 
nation begins when a spore’s internal potas- 
sium concentration drops below a certain value 
(that is, it reaches the germination threshold) 
(Fig. 1, G and H). Given these assumptions, the 
model predicts that, when exposed to consec- 
utive short germinant pulses, few spores ger- 
minate during the first pulse, whereas most 
spores do so during the second or subse- 
quent pulses (Fig. 11). This increase in the 
germination probability of spores is consistent 
with our experimental observations (Fig. 1D). 
Different germinant concentrations in the first 
pulse did not markedly change the fraction of 
germinated spores (fig. S3). By contrast, the 
germinated fraction increased with higher con- 
centration of L-alanine in the second pulse. This 
difference in the sensitivity of spores to the first 
and second germinant pulse concentrations 
further demonstrates integration of inform- 
ation. The efflux of potassium from dormant 
spores was also confirmed with an extracel- 
lular potassium indicator, Asante Potassium 
Green-4 tetramethylammonium salt (APG-4 
TMA) (fig. S4, A and B). Our modeling ap- 
proach thus shows how spores might use 
intracellular potassium concentration to inte- 
grate information about previous germinant 
exposures and change their sensitivity to fu- 
ture exposures. 


Initial potassium concentrations define the 
distance to the germination threshold 


Our mathematical model assumes that the ini- 
tial potassium content defines the distance to 
the germination threshold (Fig. 1, G and H). To 
test this, we generated a mutant strain in which 
the KtrC subunit of the KtrCD potassium im- 
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Fig. 1. B. subtilis spores integrate over two consecutive germinant pulses. (A) Bacterial spores can remain in 
dormancy (shaded area in blue) for years seemingly without any biological activity. It is thus unclear how spores sense 
environmental cues while dormant and before triggering germination. (B) Filmstrip from phase-contrast microscopy 
that shows the fractional germination response to the pulses. Spores contained in a microfluidic chip were subjected to 
3-min germinant pulses (10 mM t-alanine, dotted vertical lines) separated by 2-hour intervals. These pulses triggered 
germination of a subset of spores, which was detected by phase-contrast imaging: White dormant spores become 
phase dark when germinating as they rehydrate. Spores that maintain dormancy despite exposure to germinant pulses 
provoke the question of whether they can sense and process such environmental information. Scale bar, 5 um. 

(C) Single-cell time traces showing the change in the normalized phase-contrast intensity during spore germination 
[n = 200, subset of data from (D)]. Collective fluctuations in the image intensity are due to subtle changes in camera 
focus. (D) Fraction of dormant spores after each germinant pulse (n = 2244). The abrupt decrease in the dormant 
fraction after the second germinant pulse indicates the ability of spores to integrate signals over consecutive pulses. 
(E) The germination probability in each pulse is calculated based on the remaining dormant spores before each 
germinant pulse. The difference in the germination probability between the two pulses (vertical arrow) provides a metric 
to quantify the information integration by spores. (F) Cartoon showing the main components of our mathematical 
model. The flux of potassium in a spore is assumed to depend partially on the difference between its internal (K;) 
and external (K.) concentrations, the K-channel conductance (gx), and the membrane potential (V) of the spore. 

(G) Spore's approach to the germination threshold is dictated by initial potassium content (K;) and potassium efflux. 
(H) In the mathematical model, the initial potassium content (Kj, t = 0) and the K-channel conductance (gx) determine 
the potassium dynamics in spores [Kj(t)]. These dynamics determine the spore's propensity to germinate (see 

Box 1 and supplementary text). (I) Simulated fraction of dormant spores after each germinant pulse. 


porter was deleted, which is expected to lower 
intracellular potassium content and, conse- 
quently, put the spore closer to the threshold 
compared with wild-type spores (Fig. 2A). KtrC 
is the major potassium importer expressed in 


the inner spore membrane during the sporula- 
tion process (24), which enables potassium up- 
take (25, 26) (Fig. 2, B and C). By generating 
spores in the presence of an intracellular po- 
tassium indicator, APG-4 acetoxymethyl] ester 
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(AM), we confirmed that the AktrC spores con- 
tain less potassium than do wild-type spores 
(fig. S4, C through F). Accordingly, the AktrC 
spores are mathematically predicted to be 
more likely to germinate in response to the 
first germinant pulse (Fig. 2D). Indeed, mea- 
surements show that 42% of the AktrC spores 
germinated after the first pulse, compared with 
5% of the wild-type spores (Fig. 2, E through G, 
and movie S1). The germinated fraction of AktrC 
spores then further increased after the second 
pulse (Fig. 2H). We obtained similar results 
with spores that lacked KtrD, the other sub- 
unit of the KtrCD potassium importer (fig. S5A). 

Given the high germination probability of 
AktrC spores, almost the entire population 
(~94%) exited the dormant state after only 
two germinant pulses (Fig. 2G). However, be- 
cause the deletion of ktrC does not affect po- 
tassium efflux in spores, the integration capacity 
of AktrC spores is comparable to that of wild- 
type spores (Fig. 21). To further confirm that 
the lower potassium content of AktrC spores 
caused the higher sensitivity to germinant 
pulses, we supplied additional potassium dur- 
ing the sporulation of AxtrC cells. Increasing 
the potassium content in AxirC spores should 
increase their distance to the germination 
threshold, which in turn would be reflected 
in a decrease in their germination probabil- 
ity (fig. S5B). In agreement with those expec- 
tations, the addition of 150 mM of potassium 
in the sporulation medium resulted in AktrC 
spores that responded to germinant pulses 
similarly to wild-type spores (fig. S5, C and D). 
These results are consistent with the modeling 
prediction and support the idea that the initial 
intracellular potassium concentration of spores 
specifically defines their distance to the ger- 
mination threshold. 


Potassium ion channels contribute to the 
integration capacity of spores 


We investigated the modeling prediction that 
spores use potassium efflux to integrate over 
consecutive germinant pulses. To this end, we 
studied a mutant strain lacking the YugO po- 
tassium ion channel (27) (Fig. 3, A and B). We 
confirmed that AywgO spores contain less po- 
tassium than do wild-type spores with the 
intracellular potassium indicator APG-4 AM 
(fig. S4, C through F). Therefore, the AyugO 
spores should be initially closer to the germi- 
nation threshold and be more sensitive than 
wild-type spores to the first germinant pulse 
(Fig. 3, A through D). However, the absence 
of the YugO channel also implies a reduced 
potassium efflux in response to germinant 
pulses. According to our model, such reduced 
potassium efflux in germinant-exposed spores 
should lower their integration capacity, and 
thus, the germination probability for subse- 
quent germinant pulses would be lower (Fig. 
3D). In other words, the AywgO spores would 
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not markedly increase their sensitivity to con- 
secutive germinant pulses, which should dis- 
tinguish this strain from the wild-type and the 
AkirC strains. These mutant spores are there- 
fore predicted to approach the germination 
threshold more gradually because of reduced 
potassium efflux. 

We experimentally tested these predictions. 
Essentially, the progression of the wild-type 
and the AyugO spores toward the germination 
threshold are predicted to exhibit a crossover 
point (Fig. 3D). Experiments confirmed that 
the AywgO spores have a higher response than 
the wild type to the first germinant pulse (with 
30% versus 5% of the spores germinating, re- 
spectively) (Fig. 3, E, F, and H; movie S82). 
However, these spores lacked the increase in 
germination probability in response to sub- 
sequent pulses exhibited by the wild-type spores 
(Fig. 31). This, in turn, reflects a substantial loss 
in integration capacity of the AywgO spores 
when compared with the wild type (Fig. 3J 
and fig. S6). The phenotype of the AyzgO spores 
(high initial germination probability and low 
integration capacity) is thus consistent with 
our modeling predictions. Notably, the AywgO 
strain also indicates that the reduced efflux of 
potassium decreases the integration capacity 
of spores. These results suggest that potassium 
efflux serves as an integration mechanism that 
modulates the approach to the germination 
threshold. 

Given the complex phenotype of the AywgO 
strain, we turned to chemical perturbations of 
potassium flux in wild-type spores to inde- 
pendently determine whether potassium flux 
underlies the integration capacity of spores. 
We confirmed that modifying the external 
potassium concentration changed the inte- 
gration capacity of wild-type spores (fig. S6). 
Specifically, the absence of potassium in the 


medium, which we expected to promote higher 
potassium efflux in spores, increased the in- 
tegration capacity (from 0.47 + 0.01 to 0.63 + 
0.02). By contrast, increasing extracellular po- 
tassium concentration lowered the integration 
capacity (Media + 600 mM KCI: 0.32 + 0.01; 
Media + 1 M KCl: 0.03 + 0.01). To test whether 
reduction in integration capacity might result 
from increased osmotic stress, we showed that 
adding 1 M sorbitol had no effect on integra- 
tion capacity (Media + 1M sorbitol: 0.44 + 0.02). 
The electrochemical gradient of potassium thus 
influences the integration capacity of spores. 

We also tested how the integration capacity 
of wild-type spores is affected by blocking po- 
tassium channels with the drug quinine (1 mM) 
(Fig. 3, A through C, and figs. S4B and S6) (28-31). 
According to our model, such blocking of po- 
tassium channels is expected to specifically re- 
duce the germination probability in response 
to consecutive germinant pulses (Fig. 3D). In 
agreement with this prediction, treatment of 
spores with quinine reduced the response of 
wild-type spores to the second germinant pulse, 
with around 80% of the spores remaining dor- 
mant, in comparison with around 45% in the 
absence of the drug (Fig. 3, G, H, and I; movie S3). 
These results support the proposed integrate- 
and-fire mechanism by showing that similar 
to the deletion of the YugO channel, chemical 
blocking of potassium efflux in wild-type spores 
also impairs their integration capacity (Fig. 3J 
and fig. S6). 


Changes in the electrochemical potential 
of dormant spores 


Our mathematical model proposes that the 
flux of potassium ions driving the processing 
of information during dormancy is modulated 
by the electrochemical potential of the spores. 
According to the integrate-and-fire mechanism, 


——E—E——EE——————————————————SS es 
Box 1. Mathematical model of dormant spore electrophysiology 


According to the processes depicted in Fig. 1F, we assume that the changes in concentrations of extra- 
cellular and intracellular potassium—K, and K;, respectively—are governed by the flow of potassium ions 


through the spore membrane: 


ae = Feqn*(V — Vx) + Fenn’ (V — Vn) 


Ve (Ke Km ) 


ae — Fexn*(V Vk) 


Fgnn*(V — Vn) 


The model includes ion flow through both specific and nonspecific channels, with conductances gx and 
&n, respectively. Additionally, extracellular potassium is subject to a relaxational term (third term in the 
right-hand side of the K, equation) that pulls it to the concentration of potassium in the medium, K,,. lon flow 
through the channels depends on the electrochemical state of the spore, given by its membrane potential V 
and reversal potentials Vx and V,,, which correspond to specific and nonspecific ions, respectively. Crucially, 
the reversal potential of potassium depends on the potassium concentrations through the Nernst equation: 


Vq = Vio In(Ke/Ki) 


As mentioned in the main text, the channels are assumed to open in the presence of germinant, which 
produces an outward flux of potassium and consequently a sudden increase in membrane potential. The 
dynamics of the membrane potential V and gating variable n are described in the supplementary text, 
together with the rest of the parameters and all parameter values. 
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Fig. 2. Role of potassium in the 
germination threshold. (A) Model 
predicts that the deletion of the 
KtrC importer reduces the distance 
of spores to their germination 
threshold. This, in turn, will 
increase the germination propen- 
sity of AkirC spores compared 
with wild-type (WT) spores. 

(B) WT cells contain both potas- 
sium importers and ion channels. 
(C) The ktrC mutant spores 
(AktrC) lack the gene for the 
potassium importer KtrC. 

(D) Model-generated dormant Cc 
fraction of the WT (blue), and the 
AktrC (green) strains. Dotted 

vertical lines indicate germinant 

pulses. (E and F) Phase-contrast 
microscopy filmstrips for repre- 
sentative WT (E) and AktrC spores 
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Error bars represent standard deviation. (I) Bar plot showing the integration capacity of the WT and the AktrC strains (0.47 + 0.01 and 0.48 + 0.03, respectively) 
calculated from the differences in germination probabilities shown in (H). Error bars represent standard deviation. 


spores that are further from their germination 
threshold would require multiple germinant 
pulses to reach the threshold, each pulse caus- 
ing an incremental electrochemical potential 
change (Fig. 4A). Specifically, the transient ef- 
flux of potassium cations triggered by germinant 
pulses is mathematically predicted to increase 
the negative electrochemical potential of spores 
in a step-like manner, even when the pulses do 
not trigger germination (Fig. 4B). To test this 
prediction, we used a previously characterized 
cationic fluorescent dye, thioflavin-T (ThT) to 
measure changes in the electrochemical poten- 
tial of dormant spores (materials and methods) 
(16, 32). As spores are notoriously imperme- 
able to most chemicals (33), we expected that 
peripheral staining by ThT would reflect the 
spore’s overall negative electrochemical poten- 
tial (20). 

To experimentally test our modeling predic- 
tion of electrochemical potential jumps, we 
tracked thousands of individual wild-type 
spores over time and simultaneously imaged 
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phase-contrast and ThT fluorescence inten- 
sities (Fig. 4, C through F). Spores that did 
not trigger germination exhibited sudden 
changes in their electrochemical potential in 
response to germinant pulses (movie S4). Spores 
that required multiple germinant pulses to 
trigger germination exhibited a multistep pro- 
gression before reaching their germination 
threshold. These increases in the ThT signal 
were not due to increased spore permeability, 
as ThT continued to stain the spore’s periph- 
ery and did not transition to its interior (Fig. 
4D and movie S4). Therefore, accumulation of 
ThT on the spore periphery appears to reflect 
changes in the ionic content of the spore. We 
observed no characteristic changes of the phase- 
contrast brightness of dormant spores during 
the increases in the ThT signal (fig. S7A). We 
also tested 1-valine, another naturally occurring 
germinant (34), and observed similar changes 
in ThT signal (fig. S7B). Furthermore, we used 
another positively charged dye, tetramethyl- 
rhodamine methyl ester (TMRM), which is 


commonly used to measure the electrochem- 
ical potential of cells (35). TMRM also stained 
the periphery of spores, and increases in the 
TMRM signal amplitude were qualitatively 
similar to those measured with ThT (fig. S8, 
Aand B). To validate that the observed jumps 
in fluorescence during germinant additions 
were not simply a staining artifact, we synthe- 
sized a charge-neutral version of ThT (Fig. 4E, 
inset; figs. S8, C and D, and S9; materials and 
methods; and supplementary text). Although 
this charge-neutral ThT dye also stained the 
spore periphery, we observed no increases in 
the signal amplitude during germinant pulses. 
Instead, the fluorescence signal of the neutral 
ThT dye monotonically decayed over time, which 
was likely due to photobleaching. To exclude 
the possibility that the observed changes in 
electrochemical potential could be related to 
the release of CaDPA during the initiation of 
germination, we generated a mutant strain 
that lacked a subunit of the SpoVA channel, 
namely SpoVAF (fig. S10A). This subunit is 
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essential for effective CaDPA release during 
germination, and its deletion causes a delay 
in germination (36). The deletion of SpoVAF 
slowed the response time of spores to 1-alanine 
(fig. SIOB). However, loss of spoVAF did not 
affect the electrochemical potential changes 
that we observed in spores (fig. S10, C and D), 
which indicates that CaDPA release is not 
required for the integration of information in 
dormant spores. Together, these results dem- 
onstrate that germinant pulses cause sudden 
changes in the electrochemical potential of 
spores that otherwise remain dormant. As pre- 
dicted, spores that required multiple germinant 
pulses to initiate germination also exhibited 
multiple jumps in their electrochemical poten- 
tial, which indicates their greater distance to the 
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germination threshold. The ability to visualize 
and observe a multistep and gradual approach 
of dormant spores toward their germination 
threshold provides further evidence in support 
of the integrate-and-fire mechanism. 

We investigated whether the integration 
capacity that we determined from population- 
level statistics correlated with the indepen- 
dently observed jumps in the electrochemical 
potential of individual spores. In particular, a 
higher integration capacity should correlate 
with a higher change in electrochemical poten- 
tial. We therefore measured germinant-induced 
changes in the electrochemical potential for 
thousands of spores obtained from various 
perturbations considered in this study (Fig. 4, 
F and G). Higher integration capacity correlated 


with a higher average increase in electrochem- 
ical potential (Fig. 4H). Specifically, wild-type 
and AkirC spores, which have similarly high 
changes in electrochemical potential, also ex- 
hibited relatively higher integration capacities. 
Furthermore, the AyugO and the quinine- 
exposed wild-type spores, which both exhibited 
lower average changes in their electrochemical 
potential, had reduced integration capacities. 
These findings support the prediction that dor- 
mant spores integrate environmental informa- 
tion through ion flux-induced changes in their 
electrochemical potential. 


Discussion 


We studied how physiologically inactive spores 
detect and respond to transient germinant pulses. 
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Fig. 4. Dormant spores 
exhibit sudden changes in 
their electrochemical poten- 
tial, visualizing integration over 
germinant pulses. (A) Cartoon 
illustrating the hypothesis that 
spores release potassium after 
each germinant pulse, generating 
a change in their electrochemical 
potential. (B) Mathematically 
predicted stepwise membrane 
potential (mV) jumps when 
spores are exposed to germinant 
pulses (dotted vertical lines). 
Depicted are representative time 
traces for individual spores 
that germinate in response to 
different germinant pulse num- 
bers. The termination of the 
time trace indicates germina- 
tion. (©) Phase-contrast 
images of a spore that remains 
dormant (phase bright) 
despite exposure to three con- 
secutive germinant pulses. 
Dotted vertical lines indicate 
germinant pulse exposure. 
Scale bar, 1 um. (D) Top fluo- 
rescence filmstrip shows the 
color-coded electrochemical 
potential amplitude [ThT, 
arbitrary units (a.u.)] of the 
spore depicted in (C). 

The other three filmstrips 
below show individual spores, 
each of which germinate 

in response to different 
germinant pulses. (E) Single- 
cell time traces of the 
electrochemical potential 
signal (ThT, a.u.) for the 
corresponding spores 

n Fig. 3D (see fig. S7A for 
corresponding phase-contrast 
traces). The termination of 
the time traces indicates 
germination. The inset 

shows the time trace of a 
single spore stained with 

the charge-neutral ThT 
fluorescent dye (see fig. S8D 
for data from multiple 
spores). (F) Measurement of 
3484 individual WT spores 
showing amplitude color-coded 
time traces that show the 
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I) Conceptual summary of the proposed integrate-and-fire mechanism. Spores integrate germinant exposure information over 
m ions. The resulting change in electrochemical potential drives them toward a germination threshold. Spores that reach the threshold 
, which is marked by the abrupt change in phase-contrast refractility. 
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Our results reveal that despite their dormancy, 
spores can integrate extracellular information 
and alter their intrinsic state. This ability to 
process information appears to be supported 
by preexisting ionic gradients generated dur- 
ing sporulation. In this way, dormant spores 
can reach the decision to initiate germination 
by using electrochemical potential energy, rather 
than requiring a source of cellular energy, such 
as ATP. Spores may thus be analogous to a bi- 
ological capacitor in that they store and use an 
electrochemical potential to move closer to the 
germination threshold (Fig. 41). The integrate- 
and-fire model proposed here provides both a 
conceptual and mechanistic explanation for 
how spores can respond to an environmental 
signal despite being physiologically inactive. 
The ability to sum inputs over time before reach- 
ing a threshold ensures that germination is 
triggered only when favorable conditions per- 
sist while ignoring small environmental fluctua- 
tions. Although the integrate-and-fire model is 
used to describe how neurons process informa- 
tion, our work suggests that this concept may 
represent a more general solution to the need 
for information processing in diverse biolog- 
ical systems, including energy-limited cells. 
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PROTEIN DESIGN 


Robust deep learning—based protein sequence design 


using ProteinMPNN 


J. Dauparas?”, |. Anishchenko*”, N. Bennett!°, H. Bai'24, R. J. Ragotte?, L. F. Milles’, B. I. M. Wicky?, 
A. Courbet!, R. J. de Haas®, N. Bethel’, P. J. Y. Leung’, T. F. Huddy’, S. Pellock’, D. Tischer’, 
F. Chan?, B. Koepnick’, H. Nguyen’, A. Kang”, B. Sankaran’, A. K. Bera’, N. P. King”, D. Baker’24* 


Although deep learning has revolutionized protein structure prediction, almost all experimentally characterized 
de novo protein designs have been generated using physically based approaches such as Rosetta. Here, we 
describe a deep learning—based protein sequence design method, ProteinMPNN, that has outstanding 
performance in both in silico and experimental tests. On native protein backbones, ProteinMPNN has a sequence 
recovery of 52.4% compared with 32.9% for Rosetta. The amino acid sequence at different positions can be 
coupled between single or multiple chains, enabling application to a wide range of current protein design 
challenges. We demonstrate the broad utility and high accuracy of ProteinMPNN using x-ray crystallography, cryo— 
electron microscopy, and functional studies by rescuing previously failed designs, which were made using Rosetta or 
AlphaFold, of protein monomers, cyclic homo-oligomers, tetrahedral nanoparticles, and target-binding proteins. 


he protein sequence design problem is to 
find, given a protein backbone structure 
of interest, an amino acid sequence that 
will fold to this structure. Physically based 
approaches such as Rosetta treat sequence 
design as an energy optimization problem, 
searching for the combination of amino acid 
identities and conformations that has the 
lowest energy for a given input structure. Re- 
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cently, deep-learning approaches have shown 
promise in rapidly generating candidate amino 
acid sequences given monomeric protein back- 
bones without the need for compute-intensive 
explicit consideration of side chain rotameric 
states (I-7). However, the methods described 
thus far do not apply to the full range of cur- 
rent protein design challenges and have not 
been extensively validated experimentally. 
We sought to develop a deep learning-based 
protein sequence design method that is broad- 
ly applicable to the design of monomers, cyclic 
oligomers, protein nanoparticles, and protein- 
protein interfaces. We began from a previous- 
ly described message-passing neural network 
(MPNN) with three encoder and three decoder 
layers and 128 hidden dimensions that pre- 
dicts protein sequences in an autoregressive 
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manner from the N to C terminus using pro- 
tein backbone features—distances between 
Ca-Ca atoms, relative Co-Ca-Ca frame orien- 
tations and rotations, and backbone dihedral 
angles—as input (1). We first sought to im- 
prove performance of the model on recovering 
the amino acid sequences of native single- 
chain proteins given their backbone structures. 
A set of 19,700 high-resolution single-chain 
structures from the Protein Data Bank (PDB) 
were split into train, validation, and test sets 
(80/10/10) based on the CATH (8) protein clas- 
sification database (see methods). We found 
that including distances between N, Ca, C, O, 
and a virtual CB placed based on the other 
backbone atoms as additional input features 
resulted in a sequence recovery increase from 
41.2% (baseline model) to 49.0% (experiment 1) 
(see Table 1); interatomic distances evidently 
provide a better inductive bias to capture inter- 
actions between residues than dihedral angles 
or N-Ca-C frame orientations. We also observed 
performance improvements with edge updates 
in addition to the node updates in the back- 
bone encoder neural network (experiment 2). 
Combining the additional input features and 
edge updates leads to a sequence recovery of 
50.5% (experiment 3). To determine the range 
over which backbone geometry influences 
amino acid identity, we tested 16, 24, 32, 48, 
and 64 nearest-Co neighbor neural networks 
(fig. SLA) and found that performance was sat- 
urated at 32 to 48 neighbors. Unlike the protein 
structure prediction problem, locally connected 
graph neural networks can accurately model 
the structure-to-sequence mapping problem 
because the optimality of an amino acid ata 
particular position is largely determined by 
the immediate protein environment. 

To enable application to a broad range of 
single- and multichain design problems, we 
replaced the fixed N to C terminal decoding 
order with an order-agnostic autoregressive 
model in which the decoding order is randomly 
sampled from the set of all possible permuta- 
tions (9). This also resulted in a modest im- 
provement in sequence recovery (Table 1; 
experiment 4). Order-agnostic decoding en- 
ables design in cases where, for example, the 
middle of the protein sequence is fixed and 
the rest needs to be designed, as in protein 
binder design where the target sequence is 
known; decoding skips the fixed regions but 
includes them in the sequence context for the 
remaining positions (Fig. 1B). For multichain 
design problems (see discussion later in the 
text), to make the model equivariant to the 
order of the protein chains, we kept the per 
chain relative positional encoding capped at 
+32 residues (10) and added a binary feature 
that indicates whether the interacting pair of 
residues are from the same or different chains. 

We used the flexible decoding order to fix 
residue identities in sets of corresponding po- 
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sitions (the residues at these positions are 
decoded at the same time). For example, for 
a homodimer backbone with two chains A 
and B with sequence Ay, Ag,... and By, Bo...., 
the amino acids for chains A and B have to 
be the same for corresponding indices; we 
implement this by predicting unnormalized 
probabilities for A; and B, first and then com- 
bining these two predictions to construct a 
normalized probability distribution from 
which a joint amino acid is sampled (Fig. 1C). 
For pseudosymmetric sequence design, res- 
idues within or between chains can be sim- 
ilarly constrained; for example, for repeat 
protein design, the sequence in each repeat 
unit can be kept fixed. Multistate design of 
single sequences that encodes two or more 
desired states can be achieved by predicting 
unnormalized probabilities for each state 
and then averaging; more generally, a linear 
combination of predicted unnormalized prob- 
abilities with some positive and negative co- 
efficients can be used to upweight or downweight 
specific backbone states to achieve explicit 
positive or negative sequence design. The ar- 
chitecture of this multichain and symmetry- 
aware (positionally coupled) model, which we 
call ProteinMPNN, is outlined schematically 
in Fig. 1A. We trained ProteinMPNN on protein 
assemblies in the PDB (as of 2 August 2021) 
determined by x-ray crystallography or cryo- 
electron microscopy (cryo-EM) to better than 
3.5-A resolution and with fewer than 10,000 
residues (see methods). 

For a test set of 402 monomer backbones, 
we redesigned sequences using Rosetta fixed 
backbone combinatorial sequence design [one 
round of the PackRotamersMover (J/, 12) with 
default options and the beta_novl6 score func- 
tion] and ProteinMPNN. Although requiring 
only a small fraction of the compute time 


(1.2 versus 258.8 s on a single CPU for 100 res- 
idues), ProteinMPNN had a much higher over- 
all native sequence recovery (52.4 versus 32.9%), 
with improvements across the full range of 
residue burial from protein core to surface 
(Fig. 2A). Differences between designed and 
native amino acid biases for the core, bound- 
ary, and surface regions for the two methods 
are shown in fig. $2. 

We further evaluated ProteinMPNN on a 
test set of 690 monomers, 732 homomers (with 
fewer than 2000 residues), and 98 heteromers. 
The median sequence recoveries over all residues 
were 52% for monomers, 55% for homomers, 
and 51% for heteromers, and the median se- 
quence recoveries over interface residues were 
53% for homomers and 51% for heteromers (Fig. 
2B). In all three cases, sequence recovery corre- 
lated closely with residue burial, ranging from 
90 to 95% in the deep core to 35% on the surface 
(fig. SIB); the amount of local geometric context 
determines how well residues can be recovered 
at specific positions. 


Training with backbone noise improves model 
performance for protein design 


Although protein sequence design approaches 
have often focused on maximizing sequence 
recovery for protein backbones from high- 
resolution crystal structures, this is not necessar- 
ily optimal for actual protein design applications. 
We found that training models on backbones 
to which Gaussian noise (SD = 0.02 A) had 
been added improved sequence recovery on 
confident protein structure models generated 
by AlphaFold [average predicted local-distance 
difference test IDDT) > 80.0] from UniRef50, 
whereas the sequence recovery on unperturbed 
PDB structures significantly decreased (Table 
1); crystallographic refinement may impart 
some memory of amino acid identity in the 


Table 1. Improvements in model performance on native protein sequence recovery. Test accuracy 
(percentage of correct amino acids recovered) and test perplexity (exponentiated categorical cross- 
entropy loss per residue) for models trained on the native backbone coordinates (value to the left of the 
slash) and models trained with Gaussian noise (SD = 0.02 A) added to the backbone coordinates (value 
to the right of the slash). Noise was only added during training, and all test evaluations are with no added 
noise. The final column shows sequence recovery on 5000 AlphaFold protein backbone models, with 
average predicted IDDT > 80.0, randomly chosen from UniRef50 sequences. 
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Fig. 1. ProteinMPNN architecture. (A) Distances between N, Ca, C, O, and 
virtual CB are encoded and processed using a message-passing neural network 
(Encoder) to obtain graph node and edge features. The encoded features, 
together with a partial sequence, are used to generate amino acids iteratively 
in a random decoding order. (B) A fixed left-to-right decoding cannot use 
sequence context (green) for preceding positions (yellow), whereas a model 
trained with random decoding orders can be used with an arbitrary decoding 


backbone coordinates, which is captured by 
models trained on crystal structure backbones 
and reduced by the addition of noise. Robust- 
ness to small displacements in atomic coor- 
dinates is a desirable feature in real-world 
applications for which the protein backbone 
geometry is not known at atomic resolution. 
AlphaFold (10) and RoseTTAFold (73) make 
very good structure predictions for native pro- 
teins, given multiple sequence alignments that 
can contain substantial coevolutionary and 
other information that reflects aspects of the 
three-dimensional (3D) structure, but gen- 
erally produce less-accurate structure models 
when provided with only a single sequence. 
We reasoned that ProteinMPNN might gen- 
erate sequences for native backbones that 
more strongly encode the structures than the 
original native sequences, because evolution, 
in most cases, does not optimize for stability. 
Indeed, we found that ProteinMPNN se- 
quences generated for native backbones were 
predicted to fold to these structures much 
more confidently and accurately by AlphaFold 
than the original native sequences (Fig. 2E). 
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ProteinMPNN also strengthened the sequence- 
to-structure mapping for designed backbones: 
Over a set of de novo-designed ligand binding 
pocket-containing scaffolds generated using 
Rosetta, only 2.7% of the original designed 
sequences were predicted to fold to the tar- 
get structures, but after ProteinMPNN rede- 
sign, 54.1% were confidently predicted to fold 
to the target structures (Fig. 2F). This should 
substantially increase the utility of these scaf- 
folds for the design of small-molecule bind- 
ing and enzymatic functions. 

We further found that the strength of the 
single sequence-to-structure mapping, as as- 
sessed by AlphaFold, was higher for models 
trained with additional backbone noise. As 
noted above, the average sequence recovery 
for crystallographically refined backbones 
decreases with increasing amounts of noise 
added during training (Fig. 2C) because these 
models blur out local details of the backbone 
geometry. However, sequences generated by 
noised ProteinMPNN models are more robust- 
ly decoded into 3D coordinates by AlphaFold, 
likely because noised models focus more on 


Tied across chains 


iE [2 
Chain C 


order during the inference. The decoding order can be chosen such that 

the fixed context is decoded first. (C) Residue positions within and between 
chains can be tied together, enabling symmetric, repeat protein, and 
multistate design. In this example, a homotrimer is designed with the coupling 
of positions in different chains. Predicted unnormalized probabilities for 

tied positions are averaged to get a single probability distribution from which 
amino acids are sampled. 


overall topological features as encoded by, 
for example, the overall polar-nonpolar se- 
quence pattern than local structural details. 
For example, a model trained with 0.3-A noise 
generated two to three times more sequences 
with AlphaFold predictions within IDDT-Ca 
(14) of 95.0 and 90.0 of the true structures than 
unnoised or slightly noised models (Fig. 2C; 
training with higher levels of noise increases 
success rates for less-stringent |DDT cutoffs). 
In protein design calculations, the models 
trained with larger amounts of noise have 
the advantage of generating sequences that 
more strongly map to the target structures by 
prediction methods (this increases the fre- 
quency at which designs pass prediction-based 
filters and may, correspondingly, also increase 
the frequency of folding to the desired target 
structure). 

Because the sequence determinants of pro- 
tein expression, solubility, and function are 
not perfectly understood, in most protein de- 
sign applications, it is desirable to test multiple 
designed sequences experimentally. We found 
that the diversity of sequences generated by 
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ProteinMPNN could be considerably increased, 
with only a very small decrease in average se- 
quence recovery, by carrying out inference at 
higher temperatures (Fig. 2D). We also found 
that a measure of sequence quality derived 
from ProteinMPNN, the averaged log proba- 
bility of the sequence given the structure, 
correlated strongly with native sequence re- 
covery over a range of temperatures (fig. S3A), 
enabling rapid ranking of sequences for se- 
lection for experimental characterization. 


Experimental evaluation of ProteinMPNN 


Although in silico native protein sequence 
recovery is a useful benchmark, the ultimate 
test of a protein design method is its ability 
to generate sequences that fold to the de- 
sired structure and have the desired function 
when tested experimentally. We evaluated 
ProteinMPNN on a representative set of pro- 
tein monomer, assembly, and function de- 
sign challenges. In each case, we attempted 
to rescue previous failed designs with sequen- 
ces generated using Rosetta or AlphaFold; 
we kept the backbones of the original designs 
fixed but discarded the original sequences and 
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generated new ones using ProteinMPNN. Syn- 
thetic genes encoding the designs were ob- 
tained, and the proteins were expressed in 
Escherichia coli and characterized biochem- 
ically and structurally. 

We first tested the ability of ProteinMPNN 
to design amino acid sequences for protein 
backbones generated by deep network hallu- 
cination using AlphaFold. Starting from a ran- 
dom sequence, a Monte Carlo trajectory is 
carried out to optimize the extent to which 
AlphaFold predicts the sequence to fold to a 
well-defined structure (15). These calculations 
generated a wide range of protein sequences 
and backbones for both monomers and oligo- 
mers that differ considerably from those of 
native structures. In initial tests, the sequences 
generated by AlphaFold were encoded in syn- 
thetic genes, and we attempted to express 150 
proteins in E. coli. However, the AlphaFold- 
generated sequences were mostly insoluble 
(median soluble yield of 9 mg per liter of culture 
equivalent; Fig. 3A). To determine whether 
ProteinMPNN could overcome this problem, 
we generated sequences for a subset of these 
backbones with ProteinMPNN; residue iden- 


recovery and diversity as a function of sampling 
temperature. (E) Redesign of native protein 
backbones with ProteinMPNN considerably 
increases AphaFold prediction accuracy com- 
pared with the original native sequence using 
no multiple sequence information. Single 
sequences (designed or native) were input in 
both cases. Dark orange indicates overlap. 

(F) ProteinMPNN redesign of previous Rosetta- 
designed NTF2 fold proteins (3000 backbones 
in total) results in considerably improved 
AlphaFold single-sequence prediction accuracy. 
Dark orange indicates overlap. 


100 


tities at symmetry-equivalent positions were 
tied by averaging unnormalized probabilities. 
The designed sequences were again encoded 
in synthetic genes, and the proteins were pro- 
duced in E. coli. The success rate was far higher: 
Of the 96 designs that we attempted to ex- 
press in E. coli, 73 were expressed solubly 
(median soluble yield of 247 mg per liter of 
culture equivalent; Fig. 3A) and 50 had the 
target monomeric or oligomeric state as as- 
sessed by size exclusion chromatography (SEC) 
(Fig. 3, A and C). Many of the proteins were 
highly thermostable, with secondary structure 
being maintained up to 95°C (Fig. 3B). 

We solved the x-ray crystal structure of one 
of the ProteinMPNN monomer designs with a 
fold more complex [template modeling (TM)- 
score of 0.56 against PDB] than most de novo- 
designed proteins (Fig. 3D). The o-f protein 
structure contains five B strands and four a 
helices and is close to the design target back- 
bone (2.35 A over 130 residues), demonstrat- 
ing that ProteinMPNN can accurately encode 
monomer backbone geometry in amino acid 
sequences. The accuracy was particularly high 
in the central core of the structure, with side 
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Fig. 3. Structural characterization of ProteinMPNN designs. (A) Comparison 
of soluble protein expression over a set of AlphaFold hallucinated monomers 
and homo-oligomers (blue) and the same set of backbones with sequences 
designed using ProteinMPNN (orange) (N = 129). The total soluble protein 
yield after expression in E. coli, obtained from the integrated area under size 
exclusion traces of nickel-NTA-purified proteins, increases considerably from the 
barely soluble protein of the original sequences after ProteinMPNN rescue 
(median yields for 1 liter of culture equivalent are 9 and 247 mg, respectively). 
Boxes represent the quartiles of the soluble yield distribution and whiskers 
show the rest of it. (B to D) In-depth characterization of a monomer hallucination 
and corresponding ProteinMPNN rescue from the set in (A). Like almost all 

of the designs in (A), the sequence and structural similarities to the PDB of the 
design model are very low [expected value (E-value) = 2.8 against UniRefl00 
using HHblits; TM-score = 0.56 against PDB]. As shown in (B), the ProteinMPNN- 
rescued design has high thermostability, with a virtually unchanged circular 
dichroism profile at 95°C compared with 25°C. MRE, mean residue ellipticity. 
Shown in (C) is a SEC profile of the failed original design overlaid with the 
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ProteinMPNN sequence design, which has a clear monodisperse peak at the 
expected retention volume. mAU, milli-absorbance units. As shown in (D), 

the crystal structure of the ProteinMPNN (PDB ID 8CYK) design is nearly 
identical to the design model (2.35-A RMSD over 130 residues); see fig. S5 

for additional information. The right panel shows model side chains in the 
electron density; crystal side chains are in green, and AlphaFold side chains are 
in blue. (E and F) ProteinMPNN rescue of the Rosetta design made from a 
perfectly repeating structural and sequence unit. Residues at corresponding 
positions in the repeat unit were tied during ProteinMPNN sequence inference. 
Shown in (E) are a backbone design model (orange) and MPNN redesigned 
sequence AlphaFold model (blue) with tied residues indicated by lines (~1.2-A 
error over 232 residues). Shown in (F) is a SEC profile of the immobilized-metal 
affinity chromatograph (IMAC)-purified original Rosetta design and two 
ProteinMPNN redesigns. (G and H) Tying residues during ProteinMPNN 
sequence inference both within and between chains to enforce both repeat 
protein and cyclic symmetries. Shown in (G) is a side view of the design model. A 
set of tied residues are shown in red. Shown in (H) is a top-down view of the 
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design model. (I) Negative-stain electron micrograph of the purified design. 
(J) Class average of images from (I) closely match the top-down view in (H). 
(K) Rescue of the failed two-component Rosetta tetrahedral nanoparticle design 
133-27 (16) by ProteinMPNN interface design. After ProteinMPNN rescue, the 


chains predicted using AlphaFold from the 
ProteinMPNN sequence fitting nearly perfectly 
into the electron density (Fig. 3D). Crystal struc- 
tures and cryo-EM structures of 10 cyclic homo- 
oligomers with 130 to 1800 amino acids were 
also very close to the design target backbones 
(15). Thus, ProteinMPNN can robustly and ac- 
curately design sequences for both monomers 
and cyclic oligomers. 

We next took advantage of the flexible de- 
coding order of ProteinMPNN to design se- 
quences for proteins that contain internal 
repeats, tying the identities of proteins in 
equivalent positions. We focused on previously 
suboptimal Rosetta designs of repeat protein 
structures and found that many could be res- 
cued by ProteinMPNN redesign; an example is 
shown in Fig. 3, E and F. 

We next experimented with enforcing both 
cyclic and internal repeat symmetry by tying 
positions both within and between subunits, 
as illustrated in Fig. 3G. We experimentally 
characterized a set of such C; and Cg cyclic 
oligomers with backbones generated using 
Rosetta and with sequences designed either 
with Rosetta or with ProteinMPNN. For the 
Rosetta-designed set, only 4 of 10 designs 
tested were soluble and none had the cor- 
rect oligomeric state confirmed by SEC- 
multiangle light scattering (SEC-MALS). 
For the ProteinMPNN-designed set, 16 out of 
18 were soluble and 5 had the correct oligo- 
meric state. We characterized the structure 
of one of the designs that was large enough 
for resolution of structural features by negative- 
stain EM (Fig. 3D, and image averages were 
closely consistent with the design model 
(Fig. 3J). 

We next evaluated the ability of Protein MPNN 
to design sequences that assemble into target 
protein nanoparticle assemblies. We started 
with a set of previously described protein back- 
bones for two-component tetrahedral designs 
that were generated using a compute- and 
effort-intensive procedure that involved Rosetta 
sequence design followed by more than a 
week of manual intervention to decrease sur- 
face hydrophobicity and improve interface 
packing (16). We used ProteinMPNN to design 
76 sequences spanning 27 of these tetrahedral 
nanoparticle backbones, tying identities at 
equivalent positions in the 12 copies of each 
subunit in the assemblies, and tested these 
sequences without further intervention. Upon 
expression in FE. coli and purification by SEC, 
13 designs formed assemblies with the ex- 
pected molecular weight (~1 MDa) (fig. S4), 
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including several new tetrahedral assemblies 
that had failed using Rosetta. We solved the 
crystal structure of one of these and found that 
it was very close to the design model [1.2-A Co. 
root mean square deviation (RMSD) over two 
subunits; Fig. 3K]. Thus, ProteinMPNN can 
robustly design sequences that assemble into 
designed nanoparticle structures, which have 
proven useful for structure-based vaccine 
design (17-19). Sequence generation with 
ProteinMPNN is fully automated and requires 
only about 1 s per backbone, vastly stream- 
lining the design process compared with the 
earlier Rosetta-based procedure. 

As a final test, we evaluated the ability of 
ProteinMPNN to rescue previously failed 
designs of new protein functions using Rosetta. 
We chose as a challenging example the design 
of proteins that scaffold polyproline II helix 
motifs recognized by SH3 domains, where por- 
tions of the protein scaffold outside of the core 
SH3-binding motif make additional interactions 
with the target (the goal is to generate pro- 
tein reagents with high affinity and specificity 
for individual SH3 family members). Back- 
bones that scaffold a proline-rich SH3-binding 
motif (PPPRPPK; where P is proline, R is 
arginine, and K is lysine) recognized by the 
Grb2 SH3 domain were generated using 
Rosetta remodel (see legend of Fig. 4; the 
SH3-binding motif is colored in green in 
Fig. 4A), but sequences designed for these 
backbones and expressed in FE. coli did not 
fold to structures that bind Grb2 (Fig. 4B; the 
design problem is challenging because very 
few native proteins have proline-rich second- 
ary structure elements that closely interact 
with the core of the protein). To test whether 
ProteinMPNN could overcome this problem, 
we generated sequences for the same back- 
bones while keeping the core SH3-binding 
motif sequence (PPPRPPK) fixed and expressed 
the proteins in F. coli. Biolayer interferometry 
experiments showed strong binding to the 
Grb2 SH3 domain (Fig. 4B), with considerably 
higher signal than the free proline-rich pep- 
tide; point mutations predicted to disrupt the 
design completely eliminated the binding signal. 
Thus, ProteinMPNN can generate sequences 
for challenging protein design problems even 
when traditional Rosetta design fails. 


Conclusion 


ProteinMPNN solves sequence design prob- 
lems in a fraction of the time required for 
physically based approaches such as Rosetta, 
which carry out large-scale side chain packing 


nanoparticle assembled readily with high yield, and the crystal structure (gray) 
is very nearly identical to the design model (green and purple) (backbone 
RMSD of 1.2 A over two complete asymmetric units forming the ProteinMPNN- 


calculations; achieves much higher protein se- 
quence recovery on native backbones (52.4 
versus 32.9%); and rescues previously failed 
designs made using Rosetta or AlphaFold for 
protein monomers, assemblies, and protein- 
protein interfaces. Machine-learning sequence 
design approaches have been developed pre- 
viously (1-7), including the message-passing 
method on which ProteinMPNN is based, but 
have focused on the monomer design prob- 
lem, have achieved lower native sequence 
recoveries, and, with the exception of a triose- 
phosphate isomerase (TIM) barrel design study 
(6), have not been extensively validated using 
crystallography and cryo-EM. Whereas struc- 
ture prediction methods can be evaluated 
purely in silico, this is not the case for pro- 
tein design methods: In silico metrics such 
as native sequence recovery are very sensitive 
to crystallographic resolution (fig. S3, B and C) 
and may not correlate with proper folding 
(even a single residue substitution, while caus- 
ing little change in overall sequence recovery, 
can block folding); in the same way that 
language translation accuracy must ultimately 
be evaluated by human users, the ultimate 
test of sequence design methods is experimen- 
tal characterization. 

Unlike Rosetta and other physically based 
methods, ProteinMPNN requires no expert 
customization for specific design challenges, 
and it should thus make protein design more 
broadly accessible. This robustness reflects 
fundamental differences in how the sequence 
design problem is framed. In traditional phys- 
ically based approaches, sequence design maps 
to the problem of identifying an amino acid se- 
quence whose lowest-energy state is the desired 
structure. This is, however, computationally 
intractable because it requires computing en- 
ergies over all possible structures, including 
unwanted oligomeric and aggregated states; in- 
stead, as a proxy, Rosetta and other approaches 
carry out a search for the lowest-energy se- 
quence for a given backbone structure, and 
structure prediction calculations are required 
in a second step to confirm that there are no 
other structures in which the sequence has 
still lower energy. Because of the lack of con- 
cordance between the design objective and 
what is being explicitly optimized, considera- 
ble customization can be required to generate 
sequences that fold; for example, in Rosetta 
design calculations, hydrophobic amino acids 
are often restricted on the protein surface be- 
cause they can stabilize undesired multimeric 
states and, at the boundary region between 
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Fig. 4. Design of protein function with ProteinMPNN. (A) Design scheme. 
The first panel shows the structure (PDB ID 2WOZ) of a fragment of Gab2 peptide 
bound to the human Grb2 C-term SH3 domain (core SH3-binding motif PPPRPPK 
is in green; the target is rendered with surface and colored blue). In the second 
panel, helical bundle scaffolds were docked to the exposed face of the peptide using 
RIFDOCK (20), and Rosetta remodel was used to build loops connecting the 
peptide to the scaffolds. Rosetta sequence design with layer design task operations 
was used to optimize the sequence of the fusion (cyan) for stability, rigidity of 

the peptide-helical bundle interface, and binding affinity for the Grb2 SH3 domain. 
The third panel shows the ProteinMPNN redesign (orange) of the designed binder 
sequence; hydrogen bonds involving asparagine side chains between the peptide and 


base scaffold are shown in green and in the inset. In the fourth panel, mutation of 
the two asparagines (N) to aspartates (D) disrupts the scaffolding of the target 
peptide. (B) Experimental characterization of binding using biolayer interferometry. 
Biotinylated C-terminal SH3 domain from human Grb2 was loaded onto Streptavidin 
(SA) Biosensors, which were then immersed in solutions containing varying 
concentrations of SH3-binding peptide AIAPPPRPPKPSQ (first panel; A, alanine; |, 
isoleucine; S, serine; Q, glutamine) or of the designs (second to fourth panels) 

and then transferred to buffer lacking added protein for dissociation measurements. 
The ProteinMPNN design (third panel) has much greater binding signal than the 
original Rosetta design (second panel); this is greatly reduced by the asparagine-to- 
aspartate mutations (fourth panel). 


the protein surface and core, there can be 
considerable ambiguity about the extent to 
which such restrictions should be applied. 
Although deep-learning methods lack the phys- 
ical transparency of methods like Rosetta, they 
are trained directly to find the most probable 
amino acid for a protein backbone given all 
the examples in the PDB, and hence such am- 
biguities do not arise, making sequence de- 
sign more robust and less dependent on the 
judgment of a human expert. 

The high rate of experimental design suc- 
cess of ProteinMPNN, together with the com- 
pute efficiency, applicability to almost any 
protein sequence design problem, and lack 
of requirement for customization, should 
make it very broadly useful for protein de- 
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sign. ProteinMPNN-generated sequences also 
have a much higher propensity to crystallize, 
greatly facilitating structure determination 
of designed proteins (/5). The observation 
that ProteinMPNN-generated sequences are 
predicted to fold to native protein backbones 
more confidently and accurately than the orig- 
inal native sequences (using single-sequence 
information in both cases) suggests that 
ProteinMPNN may also be widely useful in 
improving expression and stability of recom- 
binantly expressed native proteins (with resi- 
dues required for function kept fixed). 
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Hallucinating symmetric protein assemblies 


B. |. M. Wicky'4t, L. F. Milles’*+, A. Courbet"++, R. J. Ragotte’, J. Dauparas’, E. Kinfu', S. Tipps?, 
R. D. Kibler’, M. Baek’, F. DiMaio™, X. Li?”, L. Carter’?, A. Kang’, H. Nguyen’, A. K. Bera’, D. Baker?2>* 


Deep learning generative approaches provide an opportunity to broadly explore protein structure space beyond 
the sequences and structures of natural proteins. Here, we use deep network hallucination to generate a 
wide range of symmetric protein homo-oligomers given only a specification of the number of protomers and 
the protomer length. Crystal structures of seven designs are very similar to the computational models (median 
root mean square deviation: 0.6 angstroms), as are three cryo-electron microscopy structures of giant 
10-nanometer rings with up to 1550 residues and C33 symmetry; all differ considerably from previously solved 
structures. Our results highlight the rich diversity of new protein structures that can be generated using deep 
learning and pave the way for the design of increasingly complex components for nanomachines and biomaterials. 


yclic protein oligomers play key roles in 

almost all biological processes and con- 

stitute nearly 30% of all deposited struc- 

tures in the Protein Data Bank (PDB) 

(7-4). Because of the many applications 
of cyclic protein oligomers, ranging from small 
molecule binding and catalysis to building 
blocks for nanocage assemblies (5), de novo de- 
sign of such structures has been of considerable 
interest since the inception of the protein de- 
sign field (6, 7). While there have been a num- 
ber of successes (8-10), current approaches 
require specification of the structure of the 
monomers in advance and, with the exception 
of parametrically designed helical bundles 
(11, 12), have involved rigid-body docking of 
previously characterized monomers into higher- 
order symmetric structures followed by interface 
optimization to confer low energy to the as- 
sembled state (13-17). The requirement that 
the protomer structure be specified in advance 
has limited the exploration of the full space of 
oligomeric structures, such as assemblies with 
more-intertwined chains. For monomeric pro- 
tein design, broad exploration of the space of 
possible structures has become possible by deep 
network hallucination: Starting from a random 
amino acid sequence, Markov chain Monte 
Carlo (MCMC) optimization favoring folding to 
a well-defined state converges on new sequences 
that fold to novel structures (18-27). By extension, 
we reasoned that deep network hallucination 
could enable the design of higher-order protein 
assemblies in one step, without prespecification 
or experimental confirmation of the structures 
of the protomers, provided that a suitable loss 
function specifying both protomer folding and 
assembly could be formulated (78-20, 22-25). 


Computational approach 


We set out to broadly explore the space of 
cyclic protein homo-oligomers by developing a 
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method for hallucinating such structures that 
places no constraints on the structures of either 
the protomers or the overall assemblies. Start- 
ing from only a choice of chain length L and 
oligomer valency N (2 for a dimer, 3 for a 
trimer, etc.), the method carries out a Monte 
Carlo search in sequence space starting from 
a random sequence (Fig. 1A). The loss func- 
tion guiding the search is computed by in- 
putting N copies of the sequence into the 
AlphaFold2 (AF2) network (26) and combining 
structure prediction confidence metrics [pre- 
dicted local distance difference test (pLDDT); 
per-residue structural accuracy (27); and pTM, 
an estimate of the template modeling (TM)- 
score (28)] with a measure of cyclic symmetry 
(the standard deviation of the distances be- 
tween the center of mass of adjacent proto- 
mers within the predicted structure). 

We found that monomers and dimeric to 
heptameric assemblies could readily be gen- 
erated by this procedure for chains of 65 to 130 
amino acids, with converging trajectories typ- 
ically coalescing to cyclic homo-oligomeric 
structures within a few hundred steps (~1 to 
7 CPU-days for monomers to heptamers, re- 
spectively) (figs. S1 and S2). The resulting 
structures are topologically diverse, spanning 
all-o, mixed a/®, and all-B structures, and 
differ from the structures of cyclic de novo 
designs present in the PDB (Fig. 1B). These 
assemblies, which we call HALs, also differ 
from natural proteins in both structure (Fig. 
1C) and sequence (Fig. 1D), with the median 
closest relatives in the PDB having TM-scores 
of 0.67 and 0.57 for the protomers and oligo- 
mers, respectively [29% of the structures have 
TM-scores of <0.5, the cutoff for fold assign- 
ment in CATH/SCOP (29)], indicating con- 
siderable generalization beyond the PDB 
training set. 


Experimental biophysical characterization 


We selected 150 designs with AF2 pLDDT > 
0.7 and pTM > 0.7 for experimental testing. 
However, virtually none showed appreciable sol- 
uble expression when produced in Escherichia 
coli (median soluble yield: 9 mg per liter of 
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culture equivalent) (fig. S3), and of the few 
that were marginally soluble, none had both 
the expected oligomerization state by size ex- 
clusion chromatography (SEC) and a circular 
dichroism (CD) profile consistent with the 
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hallucinated structure. We speculated that 
this failure could be a consequence of over- 
fitting during MCMC optimization leading to 
the generation of adversarial sequences that 
the network confidently predicts are struc- 


tured but in actuality have aberrant biophys- 
ical properties (figs. S4 and S5). Adversarial 
samples have been generated by activation 
maximization in the context of image classifi- 
cation neural networks, which similarly leads 


Fig. 1. Hallucinating symmetric protein 
assemblies. (A) Starting from choice of a 
cyclic symmetry and protein length, a random 
sequence is optimized by MCMC through the 
AF2 network until the resulting structure fits 
the design objective, followed by sequence 
redesign with ProteinMPNN. (B) The method 
generates structurally diverse outputs, 
quantified here by multidimensional scaling 

of protomer pairwise structural similarities 
between experimentally tested HALs (N = 351) 
and all de novo cyclic oligomers present 

in the PDB (N = 162). (C) Generated structures 
differ from those in the PDB. Median TM- 
scores to the closest match: 0.67 and 0.57 for 
the protomers and oligomers, respectively 
(vertical lines). (D) Generated sequences are 
unrelated to naturally occurring proteins. 
Median BLAST E-values from the closet hit in 
UniRefl00: 2.6 and 1.3 for the repeat motifs 
and protomers, respectively (vertical lines). 
(E) Number of ProteinMPNN design successes 
at different levels of characterization. Mono- 
disp., monodisperse. (F) Most soluble HALs 
have SEC retention volumes consistent with 
their oligomeric state. The gray line shows the 
fit to calibration standards (open circles), and 
the shaded area represents the 95% confi- 
dence interval of the calibration. (G) The 
observed molecular weights of HALs from 
SEC-MALS are close to those computed from 
the design models. (H) ProteinMPNN-designed 
HALs are thermostable. Pre-melting and 
post-melting retention volumes are closely 
correlated; circles represent designs that 
remained monodisperse, while triangles indi- 
cate polydispersity after heat treatment. In (E) 
to (H), the data are categorized by cyclic 
symmetry classes using the color scheme 

is shown in (H). In (G) and (H), the line 
indicates parity. 
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Fig. 2. Structures of HALs solved by x-ray crystallography are very close to their design models. 

(A) HALC2_ 062 (RMSD: 0.81 A). (B) HALC2_065 (RMSD: 1.02 A). (C) HALC2_068 (RMSD: 0.86 A). (D) HALC3_104 
(RMSD: 0.42 A). (E) HALC3_109 (RMSD: 0.46 A). (F) HALC4_135 (RMSD: 0.60 A). (G) HALC4_136 (RMSD: 0.34 A). 
For each row, the first panel (from the left) shows a surface rendering of the oligomer with one protomer 
highlighted in purple, the second highlights the side-chain rotamers of the design model to the 2mFo-DFc map 
(in gray), and the last two panels show two different orientations of the structural overlays between the model 


(gray) and the solved structure (colored by chains). 


to unrealistic outputs (30-32). To eliminate 
such over-fitting, we generated new sequen- 
ces for the HAL backbones using the recently 
developed ProteinMPNN sequence design 
neural network (33). For each original back- 
bone, 24 to 48 sequences were generated 
with ProteinMPNN, and assembly to the target 
oligomeric structure was validated with AF2 
(these dozens of evaluations, compared with 
the hundreds performed during hallucination, 
make overfitting much less likely). In addition, 
we independently evaluated the sequences 
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using an updated version of RoseTTAFold 
(RF2) (34) and found that while RF2 did not 
confidently predict the structure of most of 
the original AF2 hallucinated sequences, it did 
successfully predict almost all ProteinMPNN 
sequences (figs. S4, S6, and S7). 

We tested 96 ProteinMPNN-designed HALs 
with pLDDT > 0.75 and root mean square de- 
viation (RMSD) to original backbone < 1.5 A 
and found that 71/96 (74%) were expressed to 
high levels (median yield: 247 mg per liter of 
culture equivalent), 50/96 (52%) had a SEC 


retention volume consistent with the size of 
the oligomer [of which 30 (60%) were mono- 
disperse] (Fig. 1F and figs. S8 and S9), and at 
least 21/96 (22%) had the correct oligomeric 
state when assessed by SEC-multiangle light 
scattering (SEC-MALS) (Fig. 1G and fig. S10). 
CD analysis of the soluble samples indicated 
that 67/71 (94%) had secondary structure con- 
tents consistent with the designs (fig. S9). These 
success rates are in stark contrast to those of the 
original AF2 hallucinated sequences, indicating 
that the MCMC procedure generates viable 
backbones but over-fitted sequences (which 
exhibit various pathologies; fig. S5) and high- 
lighting the power of ProteinMPNN to gener- 
ate sequences that fold to a given backbone 
structure (Fig. 1E). We assessed the thermal 
stability of the 71 soluble HALs by CD spec- 
troscopy and found that 54 maintained their 
secondary structure up to 95°C (fig. S9). SEC 
characterization of the heat-treated samples 
indicated that most designs retained their olig- 
omeric state, suggesting that ProteinMPNN- 
designed HALs are thermostable (Fig. 1H and 
fig. S9). 


Structure determination 


To evaluate design accuracy, we attempted 
crystallization of 19 designs and succeeded in 
solving crystal structures for seven (three Cy, 
two C3, and two C, designs) (Fig. 2). All crys- 
tal structures had the correct oligomerization 
state and closely matched the design models 
(median Co. RMSD of 0.6 A across all designs, 
with resolutions ranging from 1.8 to 3.4 A) 
(fig. S11 and table S1). The side-chain confor- 
mations in the crystal structures also closely 
matched those of the design models (Fig. 2). 

The solved structures exhibit notable diver- 
sity, with many intricate structural features. 
HALC2_062 (Fig. 2A) is a three-layer homo- 
dimer with a single helix from each protomer 
packed together between two outer B sheets 
(one from each protomer), whereas HALC2_065 
(Fig. 2B) is also a mixed a/B homodimer but 
has a single, continuous £ sheet shared be- 
tween both chains, which wraps around two 
perpendicular paired helices. These two hal- 
lucinated structures are distinct from any 
structure in the PDB, with TM-scores to their 
best matches of 0.59 and 0.54, respectively 
(Fig. 3, A and B, and table S2). HALC2_068 
(Fig. 2C) is a fully helical dimer with an ex- 
tensive interface formed by six interacting 
helices (three from each protomer), with a 
single perpendicular helix buttressing the in- 
terfacial helices. Despite the low secondary 
structure complexity and absence of long- 
range contacts, this design also differs consid- 
erably from its closest structural relative in the 
PDB (TM-score: 0.57) (Fig. 3C and table S2). 
HALC3_104 (Fig. 2D) is a homotrimeric coiled 
coil, with a central bundle of three helices, 
augmented by an outer ring of three shorter 
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Fig. 3. Hallucinated structures differ considerably from their closest matches in the PDB. For each 
structure solved by crystallography (Fig. 2) or cryo-EM (Fig. 4B), the closest structural matches to the 
protomer and the oligomer are shown on the left and right, respectively. Designs are colored by chain, and 
the closest matching PDB is shown in gray. In most cases, the closest oligomer has an entirely different 
structure; this is particularly evident for the larger designs in (G) and (H). TM-scores (protomer, oligomer) 
are indicated in parentheses, and the PDB IDs are reported in table S2. (A) HALC2_062 (0.69, 0.59). 

(B) HALC2_065 (0.67, 0.54). (©) HALC2_068 (0.67, 0.57). (D) HALC3_104 (0.87, 0.88). (E) HALC3_109 
(0.78, 0.69). (F) HALC4_135 (0.80, 0.59). (G@) HALC4_136 (0.80, 0.71). (H) HALC15-5_262 (0.65, 0.46). 

(I) HALC18-6_265 (0.65, 0.49). (J) HALC33-3_343 (0.49, 0.41). 


helices that lie in the groove formed by the 
adjacent protomer (the closest matching struc- 
ture in the PDB has a TM-score of 0.88) (Fig. 
3D and table S2). HALC3_109 (Fig. 2E) is a 
homotrimeric three-layer all-helical structure, 
with three inner helices splaying outwards to 
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contact two additional helices from the same 
protomers at angles of roughly 25° and 
90°; the closest assembly in the PDB has a 
TM-score of 0.69 (Fig. 3E and table S2). 
HALC4_135 (Fig. 2F) is a coiled coil com- 
posed of helical hairpins reminiscent of 


HALC3_104, but with C, symmetry instead 
of Cz; symmetry, and a discontinuous super- 
helical twist. Despite its simple topology, the 
closest structural homolog to this design has a 
TM-score of only 0.59 (Fig. 3F and table S2). 
HALC4_ 136 (Fig. 2G) is composed of three- 
helix protomers with eight outer helices en- 
casing four almost fully hydrophobic inner 
helices, where two of the helices are rigidly 
linked through a 90° helical kink. The closest 
match in the PDB has a TM-score of 0.71, but 
the matched structure has C; symmetry rather 
than the C, symmetry of the design and crys- 
tal structure (Fig. 3G and table S82). 

Next, we sought to generate HALs of greater 
complexities across longer length scales by ex- 
tending the design specifications to structures 
of higher symmetry (up to Cy.) and longer 
oligomeric assembly sequence lengths (up to 
1800 residues). To generate multiple possible 
oligomers from a single structure, we specified 
the MCMC trajectories as single chains with 
internal sequence symmetry; the resulting 
structure-symmetric repeat proteins can be 
split into any desired oligomeric assembly com- 
patible with factorization (e.g., C,; into a pen- 
tamer, shorthanded as C,;.;). To maximize the 
exploration of the design space while minimiz- 
ing the use of computational resources, we 
devised an evolution-based computational strat- 
egy: Many short MCMC trajectory (<50 steps) 
outputs were clustered by structure prediction 
confidence metrics (pLDDT and pTM) and 
then used to seed new trajectories (see sup- 
plementary materials). Using this approach, 
we hallucinated cyclic homo-oligomers from 
C; to Cy with their largest dimension ranging 
from 7 to 14 nm (median: 10 nm), which were 
then divided into homotrimers, -tetramers, 
-pentamers, -hexamers, -heptamers, -octamers, 
and a dodecamer, and the backbones were 
redesigned with ProteinMPNN (Fig. 1, A and 
B). Although the a/B topology of some of 
these larger HALs is reminiscent of natural 
leucine-rich repeats (LRRs) (35), which is re- 
flected by a median highest protomer TM- 
score of 0.64, these ring-shaped structures differ 
considerably from the horseshoe folds of LRRs 
that do not close into cyclic structures. The 
closest oligomer structures in the PDB have a 
median TM-score of 0.47, and BLAST (basic 
local alignment search tool) sequence similarity 
searches for the repetitive sequence motif do 
not return any significant hits (Fig. 1D); the 
hallucination process, as in the earlier cases, 
generalizes beyond the training set. 

These larger HALs have overall molecular 
weights greater than 100 kDa and thus were 
well suited for structural characterization by 
electron microscopy (EM). We screened solu- 
ble large HALs with a SEC retention volume 
consistent with the size of their oligomeric 
state by negative stain EM (nsEM) and in most 
cases observed monodisperse particles of the 
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Fig. 4. Cryo—electron microscopy and negative stain electron microscopy 
validation of large HALs. For each design, the model is shown colored by chain 
and the corresponding internal symmetry (X) and oligomerization state (Y) are 
indicated (CX-Y). The electron density map is shown next to the model alongside 
characteristic 2D class averages. (A) Negative stain characterization of HALs. Ring 
diameters are 92, 110, 75, 80, 100, and 107 A for HALC6_220, HALC24-6_316, 
LC42-7_351, respectively. 


HALC20-5_308, HALC25-5_341, HALC18-6_278, and HA\ 


expected size and circular shape. We obtained 
two-dimensional (2D) class averages and 3D 
ab initio reconstructed electron density maps 
for six designs with Cg to Cyy internal repeat 
symmetry (factorized as two C;, three Cg, and 
one C;) that clearly showed low-resolution 
structural features and diameters consistent 
with their designs (Fig. 4A and fig. S12). We 
selected three designs: one C,; homopentamer 
(HALC5-15_262), one Cig homohexamer (HALC6- 
18_265), and one C33 homotrimer (HALC3- 
33_343) for high-resolution single-particle 
cryo-electron microscopy (cryo-EM) charac- 
terization. We collected datasets that produced 
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2D class averages with clear secondary struc- 
ture feature placements, and 3D ab initio 
reconstruction and refinement yielded 3D 
electron density maps at 4.38-, 6.51-, and 
6.32-A resolution, respectively (Fig. 4B and 
figs. S13 to S16). HALC5-15_262 was originally 
designed as a homohexamer, but structure 
prediction calculations were more consistent 
with a pentameric structure of nearly identical 
protomer conformation and a very slightly 
shifted subunit interface (fig. S17); the cryo- 
EM structure is also a pentamer with a Ca 
RMSD of 1.69 A to this predicted structure 
(fig. S16). 


(B) Cryo-EM characterization of three large HALs. The ring diameters are 87, 99, 
and 100 A for HALC15-5_262, HALC18-6_265, and HALC33-3_343, respectively. 
Top row, left panels: design model colored by chain. Top row, right panels: 
superpositions of the cryo-EM model (gray) and design model (blue). The computed 
backbone atom RMSD between the designed and experimental structure is 0.81, 
1.69, and 2.30 A, respectively (fig. S16). Bottom row: 4.38, 6.51, and 6.32 A cryo-EM 


e bars, 10 nm. 


These hallucinated rings are giant struc- 
tures quite unlike anything in the PDB. The 
three rings solved by cryo-EM, HALC5-15_ 262, 
HALC6-18_265, and HALC3-33_343, are 87, 
99, and 100 A in diameter, respectively, and 
40 to 50 A high, with a continuous parallel 
6 sheet in the lumen of the pore and outer 
helices that enforce the curvature and closure 
of the ring. HALC3-33_343 has a simple helix- 
loop-sheet structural motif as its repeating 
unit, whereas in HALC5-15_262 and HALC6- 
18_265, the repeating unit contains two dis- 
tinct helix-loop-sheet elements, which produces 
an alternating helical outer pattern clearly 
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observable in the 2D class averages. Although 
both structures have matches to LRRs for their 
protomers (TM-score of 0.65 for both, but to 
different structures), the oligomeric assemblies 
are very different from any natural protein 
(TM-scores of 0.48 and 0.49, respectively) 
(Fig.3, H and I, and table $2). HALC3-33_343 
has an unusual internal loop region breaking 
the outer helices midway in the repeat, pro- 
ducing a widening of the ring on one side, 
which is clearly visible in the cryo-EM re- 
construction; the protomer has a low TM-score 
(0.48) despite having an LRR-like topology, 
and the oligomer is even further from any cur- 
rently known structure (TM-score: 0.41) (Fig. 
3J and table $2). The high structural sym- 
metry of these designed complexes rivals that 
of natural proteins—the highest cyclic sym- 
metry recorded in the PDB for naturally oc- 
curring proteins is C39 [vault proteins (36), 
PDB IDs 4HL8 and 7PKY]. 


Conclusion 


Our deep learning-based approach to design- 
ing cyclic homo-oligomers jointly generates 
protomers and their oligomeric assemblies 
without the need for a hierarchical docking 
approach. We report a rich assortment of de 
novo protein homo-oligomers across the nano- 
scopic scale, with broad topological diversity 
while maintaining design constraints such as 
symmetry and oligomeric state. These hallu- 
cinated oligomers differ substantially from 
natural oligomers in both sequence (median 
lowest BLAST E-value against UniRef100 of 
1.3 for the repeated sequence motifs) (Fig. 1D 
and table S3) and structure (median best TM- 
score between biounits from the PDB and 
HALS of 0.57) (Fig. 1C and table $2); our com- 
putational pipeline interpolates and extends 
native fold-space rather than simply recapit- 
ulating memorized protein structures, demon- 
strating the power of deep learning to explore 
previously uncharted regions of the design 
landscape (Fig. 1B). Our results also highlight 
the power of the ProteinMPNN method for 
protein sequence design; of the 30 out of the 
192 designs evaluated experimentally by either 
SEC-MALS, nsEM, cryo-EM, or x-ray crystal- 
lography, 27 had the intended oligomeric state, 
and 7 out of 19 for which crystallization was 
attempted formed diffracting crystals (this is 
a considerably higher crystallization success 
rate than is typical for Rosetta de novo de- 
signs, suggesting that ProteinMPNN may gen- 
erate protein surfaces more likely to form 
crystal contacts). More generally, our results 
show that a rich diversity of protein structures 
and assemblies beyond what exists in the PDB 
can now be accessed by deep learning-based 
generative models. 

The formalism described here can be ex- 
tended to other types of complex design tasks, 
including the design of higher-order point 
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group symmetries, arbitrary symmetric or 
asymmetric hetero-oligomeric assemblies, olig- 
omeric scaffolding of existing functional do- 
mains, and design of multiple states, provided 
a loss function describing the solution can be 
formalized and computed. Computational re- 
quirements and hardware memory limitations 
become bottlenecks for hallucination of in- 
creasingly large structures; the development 
of computationally less expensive structure 
prediction methods with fewer parameters, as 
well as generative approaches such as diffu- 
sion models (37, 38) that more directly sample 
in structure space, should enable the design 
of even more complex protein structures and 
assemblies. 
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TRANSCRIPTION 


Structures of +1 nucleosome-bound 


PIC-Mediator complex 


Xizi Chen‘, Xinxin Wang'+, Weida Liu'+, Yulei Ren’, Xuechun Qu’, Jiabei Li’, Xiaotong Yin’, Yanhui Xu>2>* 


RNA polymerase II-mediated eukaryotic transcription starts with the assembly of the preinitiation complex 
(PIC) on core promoters. The +1 nucleosome is well positioned about 40 base pairs downstream of the 
transcription start site (TSS) and is commonly known as a barrier of transcription. The +1 nucleosome—bound 
PIC-Mediator structures show that PIC-Mediator prefers binding to T4ON nucleosome located 40 base pairs 
downstream of TSS and contacts T5ON but not the T70N nucleosome. The nucleosome facilitates the 
organization of PIC-Mediator on the promoter by binding TFIIH subunit p52 and Mediator subunits MED19 and 
MEDZ26 and may contribute to transcription initiation. PIC-Mediator exhibits multiple nucleosome-binding 
patterns, supporting a structural role of the +1 nucleosome in the coordination of PIC-Mediator assembly. Our 
study reveals the molecular mechanism of PIC-Mediator organization on chromatin and underscores the 
significance of the +1 nucleosome in regulating transcription initiation. 


ore promoters of transcriptionally active 
genes in eukaryotic cells are character- 
ized by a nucleosome-depleted region 
with two flanking nucleosomes com- 
monly known as the -1 (the last upstream) 
and the +1 (the first downstream) nucleosomes 
(1-4). The naked core promoter, ~100 base pairs 
(bp) in length, covers the transcription start 
site (TSS) and permits access of transcription 
machineries to initiate transcription (5). Tran- 
scription initiation starts with the assembly 
of a preinitiation complex (PIC) formed by 
RNA polymerase II (Pol II) and general tran- 
scription factors, including TFIID, TFIIA, TFIIB, 
TFIIF, TFIIE, and TFIIH (6-8). TFIID is globally 
required for transcription initiation and its 
function could not be replaced by TATA box- 
binding protein (TBP) (9-71). TFIIH has two 
enzymatic subunits, the DNA translocase 
xeroderma pigmentosum type B (XPB) for 
promoter opening (12) and cyclin-dependent 
kinase 7 (CDK7) for phosphorylation of the Pol 
II C-terminal domain (CTD) (13). As a critical 
transcription coactivator, the multisubunit 
Mediator bridges transcription factors and 
Pol II and activates transcription (J4, 15). 
Eukaryotic transcription is regulated by 
the chromatin architecture around the core 
promoter, which is established by combina- 
tory factors including DNA sequence, tran- 
scription factors, and chromatin remodelers 
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(1-5). The +1 nucleosome is thought to be a 
barrier of Pol II-mediated early elongation 
and a transcription regulator containing char- 
acteristic epigenetic marks including H2A.Z 
and H3.3 histone variants and histone acet- 
ylation and H3K4 trimethylation (H3K4me3) 
(1-4, 16-18). For example, recruitment of TFIID 
to chromatin is enhanced by binding H3K4me3 
and histone acetylation (19, 20). The +1 nu- 
cleosome is positioned ~40 bp downstream 
of TSS (27-23), and PIC may also regulate the 
dynamics of the +1 nucleosome (24), suggest- 
ing a functional connection between PIC and 
the +1 nucleosome in a position-dependent 
manner. We and others recently reported struc- 
tures of PIC (25) and PIC-Mediator (26-29) on 
naked promoters. Structures of the nucleosome- 
bound transcription elongation complex 
revealed how Pol II traverses nucleosomes 
during elongation (30-33). Despite these stud- 
ies, it remains elusive whether and how the 
+1 nucleosome directly contacts and regulates 
transcription initiation machineries on most 
active genes. 


Structure determination of the nucleosome- 
bound PIC-Mediator complex 


To assemble a template containing promoter 
and downstream nucleosome, we synthesized 
a modified super core promoter (25, 26) DNA 
scaffold in which a nucleosome-positioning 
sequence (Widom-601) (34) is placed at +41 to 
+188, followed by a short extranucleosomal DNA 
(+189 to +204) (Fig. 1A). The H2A.Z-containing 
nucleosome was reconstituted and named 
T40N, where “T” denotes TSS, “N” denotes the 
nucleosome, and “40” denotes the 40-bp gap 
between TSS and the 5’ end of nucleosomal 
DNA. In T40N, upstream and downstream 
promoter regions are defined relative to the 
putative TSS, which is separated from the dyad 
(or midpoint) of the nucleosome by 114 bp. The 
PIC-Mediator complex was assembled on T40N, 
and the complex is designated PIC™°’-MED 


(fig. S1). The cryo-electron microscopy (cryo- 
EM) structure was determined and the struc- 
tural models were built with structures of the 
PIC-Mediator complex (26) and the H2A.Z- 
containing nucleosome (35) as templates (figs. 
S1 to S3 and data S1). 

Our previous structural study of PIC-Medi- 
ator showed a conformation of the holo-PIC- 
Mediator (hPIC-MED) and a transition state 
called pre-hPIC-MED, the latter of which re- 
sults from binding of a TFIID domain to the 
Initiator (Inr) element (5) of the super core 
promoter (26). To avoid this conformational 
heterogeneity, we mutated Inr in the T40N 
scaffold (see the materials and methods). As 
expected, cryo-EM three-dimensional (3D) clas- 
sification showed a predominant hPIC confor- 
mation with the promoter suspended above 
the active-site cleft of Pol IT (Fig. 1B and fig. S1). 


Overall structure of the nucleosome-bound 
PIC-Mediator 


Cryo-EM map of PIC™°’-MED exhibits dimen- 
sions of ~470 A x 420 A x 240 A, consisting of 
84 protein molecules with a molecular weight 
of ~4.3 mDa (Fig. 1B and movie S1), which 
represents the largest reported structure of 
transcription machinery. PIC and Mediator 
are organized similarly to hPIC-MED (26). As 
shown in Fig. 1B, Pol I, TFIIF, TFA, TFIIB, 
TBP, and TFIIE form a central PIC core, which 
binds the 26-subunit Mediator on the top, the 
20-subunit TFIID on the bottom, and the 10- 
subunit TFIIH on one side. From the 5’ to the 
3' end, the promoter is bent by TBP and TFIIB 
at -31/-24 (TATA box), passes through and is 
well stabilized by the PIC core at -23/+11, is 
grasped by XPB at +12/+20, extends out of PIC 
and runs toward the nucleosome at +21/+40, 
and wraps around the histone octamer as a 
nucleosomal DNA (Fig. 1B, top panel). 

The three modules of Mediator, including 
Head™>, Middle, and TailM”?, make multi- 
ple contacts with PIC and the nucleosome (Fig. 
1B and movie S1). Tail”? flanks outside and 
has no contact with PIC. HeadM™ packs against 
the Pol II stalk and the TFIIH core above the 
PIC core, and the nucleosome-proximal end is 
capped by the CDK-activating kinase of TFIH 
(CAK™""), MiddleM™” adopts an extended 
conformation with MEDI, binding Tail™™ at 
one end and the HookM™” submodule bind- 
ing nucleosome at the other end. 

The nucleosome makes multiple contacts 
with Mediator and TFIIH and resembles a 
tilted rolling wheel with the 5’ end of nucleo- 
somal DNA naturally tethered to the 3’ end 
of the downstream promoter (Fig. 1B and 
movie S1). The nucleosome is highly dynamic 
relative to PIC-Mediator, as indicated by its weak 
cryo-EM density. The tether and contacts are 
separated around the nucleosome core par- 
ticle and together bring PIC-Mediator to the 
nucleosome. The overall structure suggests a 
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Fig. 1. Structure of the nucleosome-bound PIC-Mediator complex. (A) Schematic different views. Cryo-EM map of the nucleosome is shown as a transparent surface 
diagram of the T4ON nucleosome scaffold. Numbers represent the positions relative to covering the structural model. A close-up view of the promoter and PIC core is shown in 
TSS (+1). (B) Composite cryo-EM map and structural model of PIC™°’-MED in two the top panel. The color scheme of the structural model is described at the bottom. 
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Fig. 2. Binding of PIC-Mediator to T40N nucleosome. (A and B) Close-up 
views of the interactions between PIC-Mediator and the nucleosome in the 
T40N4 (A) and T40NH (B) states. Top panels show transparent cryo-EM maps 
covering the structural model in two different views. Intermodular contacts are 
indicated with numbers. Arrows indicate the direction of XPB-mediated promoter 
translocation. Bottom panels show the binding of PIC-Mediator to the 


direct role of the +1 nucleosome in organiz- 
ing PIC-Mediator on chromatin. 


Structures of PIC™°N-MED in two 
nucleosome-binding states 


Cryo-EM 3D classification showed the PIC™'- 
MED complex in two conformations, in which 
PIC-Mediator binds the nucleosome core parti- 
cle in two distinct states (Figs. 2 and 3B, figs. S1 
to S4, and movies S2 to S4). Locally refined 
cryo-EM maps supported the placement of 
nucleosome and guided structural analysis of 
intermodular contacts. In the T40N™™ state 
(where “M” and “H” denote the nucleosome 
binding of Mediator and TFIIH, respectively), 
Hook™®” contacts nucleosomal DNA and the 
TFHH core contacts both histones and nucleo- 
somal DNA. In the T40N" state, the TFIIH core 
contacts nucleosomal DNA, whereas Mediator 
detaches from the nucleosome. 

Compared with the T40N™” state, the struc- 
ture in the T40N" state shows displacement 
of the nucleosome by ~20 A and rotation of the 
nucleosome-tethered downstream promoter by 
~10° (Figs. 2C and 3B, fig. S4, and movie S2). 
Locally refined cryo-EM maps show canonical 
nucleosome conformation with DNA wrapping 
around the histone octamer (fig. $3). In both 
structures, the 5’ end of nucleosomal DNA and 
the edge of XPB-bound promoter at +21 is sep- 
arated by ~60 A, roughly equivalent to the length 
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of a slightly bent 20-bp DNA (fig. S4A), suggesting 
that the reconstituted nucleosome is well posi- 
tioned on the Widom-601 segment. Therefore, 
the distinct nucleosome placement may result 
from intrinsic dynamics and/or conformational 
heterogeneity of the nucleosome-bound PIC- 
Mediator complex. 


Binding of PIC-Mediator to the nucleosome 
in the T40N“" and T4ON" states 


A cryo-EM map of PIC™°“-MED in the T40N™™ 
state shows direct contact between the TFITH 
core and the promoter-proximal half of the 
nucleosome (Fig. 2A; fig. S4, A and B; and movie 
83). Further analysis suggests electrostatic 
interaction between an exposed, positively 
charged motif (R?“KRKSRR?*°) of p52 and a 
well-characterized acidic patch of the H2A.Z- 
H2B heterodimer. The nucleosome-binding 
pattern is reminiscent of a general nucleosome 
recognition and association by a positively 
charged motif (a well-known arginine anchor) 
of chromatin factors and enzymes (36). More- 
over, a positively charged patch of XPB within 
the TFIIH core contacts nucleosomal DNA. 
The binding of the TFIIH core to the histone 
acidic patch may contribute to the regulatory 
role of PIC in H2A.Z turnover (24). 

Hook™®” makes direct contact with nucleo- 
somal DNA (Fig. 2A; fig. S4, A and B; and 
movie $3). The placement of Hookm™” near the 


nucleosome through putative electrostatic interactions. The positively charged 
motif of p52 within the TFIIH core and the H2A.Z-H2B acidic patch are shown in 
electrostatic potential surfaces. Positively charged residues of MED19 are 
highlighted in blue balls (A). (C) Structural differences of PIC™°’-MED in the 
T40N™4 and T40N} states. Cryo-EM maps are shown with Pol Il superimposed 
(Pol Il is omitted for clarity). 


nucleosome suggests that a positively charged 
motif (K°°9KVKEK™) of MED19 generates 
electrostatic interaction with the phosphate 
groups of nucleosomal DNA. The promoter- 
distal half of the nucleosome is covered by 
unassigned cryo-EM density, which might be 
derived from nearby Mediator subunits, pre- 
sumably MED26 and MED19 (fig. S4D). The 
N-terminal domain of the metazoan-specific 
Mediator subunit MED26 (NTD™”*) consists 
of a positively charged patch, and a C-terminal 
positively charged region (residues 199 to 220) 
of MED19 is unstructured and may bind to 
the histone acidic patch or the acidic surface 
of nucleosomal DNA. 

Compared with the T40N™™ state, the cryo- 
EM map in the T40N" state shows the follow- 
ing structural differences (Figs. 2 and 3B, fig. 
S4, and movie S4): the nucleosome detaches 
from Hook™™? and XPB, the downstream 
promoter at +21/+40 is more relaxed and 
extended, and the displacement of the nu- 
cleosome relative to the TFIIH core leads to 
dissociation of p52 from the H2A.Z-H2B acidic 
patch and generates contact between the 
positively charged motif of p52 and the neg- 
atively charged phosphate groups of nucleoso- 
mal DNA. 

Binding of the nucleosome does not alter 
the overall structure of PIC-Mediator, but it 
may stabilize the complex on chromatin by 
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Super core promoter 


Extranuel 
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Fig. 3. Structures of PIC-Mediator bound to differently positioned nucleosomes. 
(A) Schematic diagram of the three nucleosome scaffolds. (B to D) Composite 
cryo-EM maps of PIC'°N-MED (B), PIC'°°N-MED (C), and PIC'’°-MED (D). 
Multiple nucleosome-binding states are shown in (B) and (C), with conformational 
differences indicated by arrows. The putative position of the detached T70N 


tethering nucleosomal DNA to the downstream 
promoter; the interaction between the nu- 
cleosome and p52 of the TFIH core; binding 
of nucleosomal DNA to Hook“? (MED19 
and MED26) and XPB of the TFIIH core (in 
the T40N™" state); contact between TFIID-C 
(TAF2) and the p52-p8 dimer of the TFIIH core; 
and interactions among Pol II, XPB, and the 
downstream promoter at +11/+21 (Fig. 2, A 
and B; fig. S4; and movies S3 and S4,). Inter- 
acting subunits are stably integrated within 
corresponding complexes and modules and 
then mediate intermodular contacts that are 
relatively dynamic. Although some of the con- 
tacts might be relatively weak, adding them to 
the existing interactions between the promoter 
and PIC-Mediator (26) may facilitate the sta- 
bilization of PIC-Mediator on chromatin. Once 
PIC-Mediator binds the +1 nucleosome, the 
TFIIH core and Hook™™” may also peel off 
nucleosome-bound factors and enzymes and 
prevent their potential interference with tran- 
scription initiation. 

Apart from the stabilization of the PIC- 
Mediator complex on chromatin, the network 
interactions may also modulate XPB activity 
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SS 


H2A.Z-H2B 
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(Fig. 2, A and B, and fig. S4). Modular organi- 
zation of the nucleosome-bound PIC-Mediator 
complex is reminiscent of ATP-dependent chro- 
matin remodelers, which bind the nucleosome 
and generate DNA translocation relative to the 
histone octamer (37). The difference is that XPB 
pulls DNA away from the nucleosome, whereas 
chromatin remodelers push DNA toward the 
nucleosome. Further studies may reveal how 
the nucleosome affects XPB-mediated DNA 
translocation and whether it undergoes trans- 
location relative to promoter DNA during tran- 
scription initiation. 


PIC-Mediator prefers to bind T40N and 
contacts T50N but not the T70N nucleosome 


Genome-wide sequencing showed that core 
promoters are flanked by the precisely posi- 
tioned +1 nucleosome and phased downstream 
nucleosomes, implying a critical role of the gap 
between the nucleosome and transcription 
machinery in the regulation of transcription 
initiation (/-5, 27-23). To investigate the ef- 
fect of TSS-nucleosome separation on the or- 
ganization of PIC-Mediator on chromatin, we 
reconstituted H2A.Z-containing T50N and 


nucleosome is indicated as a dashed circle in (D). (E) Close-up views of the 
nucleosome in four distinct binding states in PIC'°°’-MED. Cryo-EM maps are 
shown with Pol Il superimposed (Pol II omitted) for comparison. Note that 
the cryo-EM maps of the nucleosome [(C) and (E)] are weaker than that in 
PIC™°N-MED (B), reflecting the increase in nucleosome dynamics. 


T70N nucleosomes, assembled nucleosome- 
bound PIC-Mediator complexes, and deter- 
mined their cryo-EM structures (Fig. 3, fig. S1, 
and data S1 and S2). Cryo-EM maps show that 
PIC-Mediator directly contacts the nucleo- 
some core particle in PICT°S-MED but not in 
PICT°N-MED. In PIC’’-MED, the promoter- 
tethered nucleosome was present during sam- 
ple preparation (fig. SID) but was invisible in 
the cryo-EM map, reflecting its flexibility due 
to the lack of stabilization. Thus, the promoter- 
bound PIC-Mediator contacts the nucleosome 
if it is 40 or 50 bp (but not 70 bp) downstream 
of TSS. 

Cryo-EM 3D classification of PIC°’-MED 
revealed four distinct conformations, in which 
the nucleosome binds (i) Mediator (T50N™), 
(ii) Mediator and TFIIH (T50N™), or (iii) 
TFIIH (T50N") or (iv) detaches from PIC- 
Mediator (T50N”) (Fig. 3, C and E; fig. $5; 
movie S5; and data $2). Comparison of the 
four conformations (from T50N™ to T50N”) 
shows gradual relaxation of the downstream 
promoter and dissociation of the nucleosome 
from PIC-Mediator. Compared with that in 
the T50N™ state, the structure in the T50N” 
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Fig. 4. The +1 nucleosome may 


enhance transcription initiation A -54 cn +44 O 
by binding PIC-Mediator. T40N (+31G)( | 30bp G-less cassette) ) 


(A) Schematic diagram of the nu- 
cleosome scaffolds for transcription 


-54 
T70N (+17G) 


+17(Stall site) 


and histone substitution (H2A.Z 
replaced by H2A) as indicated. Sche- 
matic diagram shows the substitution 
or deletion of critical regions in the 
mutated subunits. The maps are 
shown with Pol Il superimposed for 
comparison (Pol II omitted). 


initiation assays. (B and D) In vitro 
transcription initiation assay using B D PIC-Mediator mutant 
naked DNA and the T40N and T70N > © 
-li a -li 2) T40N TATA-like 

nucleosomes as templates. RNA TATA TATA-like TATA-less TATA-like (+17G) KS 2 © RY rs 
products from one of the three xs Ss RS ¢ RES ¢ AS £  & oS < oS p300 + + vy 
independent experiments were QA GA GDA 2) ANS ¥ Ac-coA i wt 
visualized by autoradiography. Rela- 
tive transcription (Txn) activity is oe i 3int 17nt *) |31nt 31int 
indicated below each lane. ND, not 
determined. CDK7(KD), kinase-dead 12 12 12 23 4 5 6 1 2 1 2 
mutant of CDK7. H2A.Z-containing Txn 1.0 1.15 1.0 2.93 ND ND Txn 4 4 iss aan 0 1.18 1.12 0.80 0.61 0.32 Txn 1.0 1.24 Txn 1.0 1.40 
nucleosomes were used if not specified Cc 1 155 a ais 244 
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i ee te CPE ee MED19 x*5PCGKKVKEKLSNFL'”? p52 
PIC’™-MED in the T40N""' state (as E°5GCGEEVEEECGSGGS! 471 600 R24KRKSRR?°° 
a reference) and the PIC™°N-MED MED19-mut (——_ MED26-mut (os —SCié*p52-mutt E?"“ERESEE?® 
mutants containing mutated subunits 1 155 


. 


CoreTllH 


state shows a rotation of the downstream pro- 
moter by ~35° and displacement of the nucleo- 
some by ~105 A away from Mediator. 

In the T50N™ state, the nucleosome does not 
contact the TFIIH core, and the downstream 
promoter undergoes considerable bending be- 
fore running into the nucleosome (Fig. 3E and 
fig. S5). The nucleosome is positioned higher 
than all other structures, and the downstream 
promoter is bent the most. This promoter 
bending and tension likely result from binding 
of Hook™™” to nucleosomal DNA. In the T50N™ 
state, the nucleosome retains marginal contact 
with Hook™®” and the TFIIH core contacts the 
H2A.Z-H2B acidic patch. In the T50N" state, 
the nucleosome detaches from Mediator, where- 
as the TFIIH core contacts nucleosomal DNA. 
Nucleosome-binding patterns in the T50N™™ 
and T5ON" states are similar to that of PIC™- 
MED in equivalent states except for the rotation 
of nucleosome (dyad axis) by ~35° (fig. S5, E and 
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aN 


Core™llH 


F), which likely results from the additional 10-bp 
gap between the 3’ end of the downstream 
promoter and the 5’ end of nucleosomal DNA. 

The structure in the T50N” state shows a 
fully detached nucleosome, which naturally 
flanks the downstream promoter but has no 
direct contact with PIC-Mediator (Fig. 3E 
and fig. S5). The increase in gap between the 
nucleosome and TSS from 40 to 50 bp likely 
generates higher DNA-bending tension and 
favors the detachment of the nucleosome 
from PIC-Mediator. Consistently, the nucleo- 
some in PICT’°N-MED is more dynamic than 
that in PIC™°’-MED, as evidenced by the 
higher flexibility around the nucleosome and 
by nucleosome detachment in the T50N” state. 

Multiple nucleosome-binding patterns in 
PIC™*°’-MED and PIC’°N-MED indicate that 
PIC-Mediator prefers to bind the nucleosome 
40 bp downstream of TSS and tolerates varia- 
tion of the nucleosome position to some extent, 


MED19-/26-mut 


Core ™FlilH 


which is consistent with the genomic posi- 
tioning of the +1 nucleosome relative to TSS 
(21-23). Such a nucleosome-binding feature 
also agrees with the characteristics of non- 
sequence-specific electrostatic interactions 
(Fig. 2), which could accommodate modular 
fluctuation by changes in binding sites. 


The nucleosome may enhance transcription 
initiation by binding PIC-Mediator 

We next performed an in vitro transcription 
initiation assay using Pol II, general tran- 
scription factors, and Mediator. Reconstituted 
promoter-nucleosome and naked promoter 
templates with the same DNA scaffold were 
used as substrates for comparison (Fig. 4A). 
Each DNA scaffold consists of two continuous 
guanine deoxynucleotides on a non-template 
strand at positions +31 and +32 (called +31G) 
downstream of TSS. The reactions were per- 
formed in the presence of chain-terminating 
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Fig. 5. Schematic model of PIC-Mediator organization on chromatin. 
PIC-Mediator assembly on chromatin starts with the recruitment of Mediator 
and PIC components to the core promoter by promoter-positioning elements 


and histone modifications of the +1 nucleosome. PIC- 


3'-O-methyl GTP instead of normal 3'-OH GTP. 
Thus, Pol II halts at position +31 to avoid the 
barrier effect of the nucleosome. 

TATA box-containing promoters showed 
comparable transcription activities on tem- 
plates of the naked promoter and nucleosome, 
generating the expected RNA products of 
31 nucleotides (Fig. 4B and fig. S6C). By con- 
trast, the nucleosome enhanced activity on the 
T40N (+31G) TATA-like promoter, with con- 
sensus TATA box motif TATAAAAG being re- 
placed by CATAAGAG (derived from the human 
RPLP1 gene promoter) (fig. S6A) (25). The nu- 
cleosome also enhanced activity on the TATA- 
less promoter, but the signals were too weak to 
be accurately quantified. Activity enhance- 
ment was not observed on the T70N (+17G) 
TATA-like promoter, consistent with the struc- 
tural observation that PIC-Mediator does not 
bind the T70N nucleosome (Fig. 3D). There- 
fore, the nucleosome may enhance transcrip- 
tion initiation on TATA-like and TATA-less 
promoters. TATA box promoters may be strong 
enough to support efficient PIC-Mediator 
assembly and thus bypass the effect of the 
nucleosome. 

To evaluate the role of nucleosome-contacting 
regions in the organization of PIC-Mediator on 
nucleosome and transcription initiation, we 
reconstituted PIC-Mediator mutants contain- 
ing mutated subunits in which the putative 
nucleosome-contacting regions were deleted or 
mutated by changing positively charged residues 
to negatively charged residues (Fig. 4C, fig. S1, 
and materials and methods). The PIC-Mediator 
mutants were individually assembled with H2A. 
Z-containing T40N, followed by structure deter- 
mination (data S1 and S2), and subjected to 
transcription assay using the nucleosome on 
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on Chromatin 


ediator is positioned 


the T40N (+31G) TATA-like promoter (Fig. 4D 
and fig. S6D). 

The MED26-mutated complex (MED26-mut) 
showed a conformation in the T40N" state, 
and no T40N™" state was observed (Fig. 4C 
and fig. S6B). The MED19-mut and MED19-/ 
26-mut complexes showed that the nucleosome 
contacts p52, as in the T40N" state, but is more 
separated from HookM=?, which is consistent 
with the loss of restraint by the HookM™? sub- 
units, MED19, and MED26. Cryo-EM maps of 
the three mutated complexes lack the extra den- 
sity on the promoter-distal half of nucleosome, 
supporting that the density might be derived 
from MED19 and/or MED26 (fig. S4, D and E). 
The MED26-mut and MED19-mut complexes 
showed a moderate decrease in transcription 
activity, and the MED19-/26-mut complex ex- 
hibited an evident decrease (Fig. 4D and fig. 
S6D). Thus, nucleosome-Hook™”” interactions 
through MED19 and MED26 might be re- 
quired for nucleosome-enhanced transcription 
initiation. The N-terminal domain of MED26 
has been reported to bind TFIID (38), suggest- 
ing an effect of MED26 through TFIID, the +1 
nucleosome, or both. 

Compared with PIC™°’-MED, in the p52-mut 
complex, p52 contacts neither H2A.Z-H2B het- 
erodimer (in T40N™") nor nucleosomal DNA 
(in T40N"), leading to a considerable rotation of 
nucleosome away from the mutated p52 (Fig. 4C 
and fig. S6B). The p52-mut complex showed 
comparable transcription activity (Fig. 4D and 
fig. S6D), consistent with the remaining contacts 
of the nucleosome with Hook™™” and XPB (Fig. 
4C). The structure of PIC™°’-MED on H2A- 
containing T40N reveals the placement of the 
nucleosome slightly separated from Hook™™? 
compared with that in the T40N" state (Fig. 4C 


Initiation 
PIC disassembly 
Early elongation 
Pausing & Pause release 


— 


and organized on chromatin by integrating promoter-positioning information 
and the position of the +1 nucleosome. The effect of the +1 nucleosome 

on the following transcription processes awaits further investigation. PHD, plant 
homeodomain; BRD, double bromodomain. 


and fig. S6B). No T40N™" state was observed. A 
slight increase in transcription was observed for 
reactions using H2A.Z-containing nucleosome 
compared with H2A nucleosome (Fig. 4D and 
fig. S6D). Thus, the substitution of H2A by H2A. 
Z alters the organization of PIC-Mediator on the 
promoter-nucleosome and seems to favor tran- 
scription initiation, consistent with its specific 
presence in the +1 nucleosome. 


The +1 nucleosome serves as a hub 
to orchestrate PIC-Mediator organization 
on chromatin 


It is known that the +1 nucleosome recruits tran- 
scription regulators through histone modifica- 
tions (4). Within TFIID, the plant homeodomain 
of TAF3 specifically recognizes H3K4me3 (19), 
and the double bromodomain of TAF1 recog- 
nizes histone acetylation (20). These two read- 
ers are tethered to TFIID through intrinsically 
disordered regions and are invisible in struc- 
tures reported to date (25, 26). Compared with 
the H2A.Z nucleosome, the acetylated-H2A.Z 
nucleosome showed slightly higher transcrip- 
tion (Fig. 4D and fig. S6D), suggesting a favor- 
able recruitment of TFIID and PIC-Mediator 
assembly near the +1 nucleosome (Fig. 5). De- 
spite the known critical roles of the +1 nucleo- 
some in transcription regulation, its intimate 
binding to transcription machineries was neither 
structurally visualized nor previously expected. 
Our study shows that the +1 nucleosome binds 
PIC-Mediator and may enhance transcription 
initiation. The nucleosome and nucleosome- 
bound regions of PIC-Mediator are highly flex- 
ible, and the nucleosome binds PIC-Mediator 
through multiple charge-charge contacts. Such 
dynamic binding of PIC-Mediator to the nu- 
cleosome permits PIC-Mediator to integrate 
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sequence information of the core promoter 
and positional information of the highly regu- 
lated +1 nucleosome. Genomic positioning of 
PIC-Mediator and the site of transcription ini- 
tiation may depend on, not only the position- 
ing elements of core promoters (5, 39), but also 
the position of the +1 nucleosome. 

Eukaryotic cells share similar features in chro- 
matin architecture around core promoters 
(-5, 21-23) and modular organization of PIC- 
Mediator on DNA templates (26-29), suggest- 
ing a conserved mechanism of PIC-Mediator 
assembly on chromatin. Distinct from the ~40-bp 
separation in human cells, TSS is positioned 
~10 bp into the +1 nucleosome in yeast (22). 
This difference may result from distinct TSS 
selection mechanisms: Human Pol II starts 
transcription at a site ~30 bp downstream of 
TATA box (5), whereas yeast Pol II scans the 
promoter and finds the TSS ~40 to 120 bp 
downstream of the TATA box (40). 

In summary, the +1 nucleosome not only 
exists as a barrier that should be overcome 
during early transcription elongation, but it 
also serves as a hub organizing and modulating 
the PIC-Mediator complex on chromatin. The 
critical role of the +1 nucleosome in transcrip- 
tion should be considered throughout the early 
transcription process, from PIC assembly and 
disassembly, transcription initiation, pausing, 
pause release, and passage of Pol II through 
the +1 nucleosome during early elongation. 
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Establishing causal links between inherited polymorphisms and cancer risk is challenging. Here, we focus 
on the single-nucleotide polymorphism rs55705857, which confers a sixfold greater risk of isocitrate 
dehydrogenase (/DH)—mutant low-grade glioma (LGG). We reveal that rs55705857 itself is the causal 
variant and is associated with molecular pathways that drive LGG. Mechanistically, we show that 
rs55705857 resides within a brain-specific enhancer, where the risk allele disrupts OCT2/4 binding, 
allowing increased interaction with the Myc promoter and increased Myc expression. Mutating the 
orthologous mouse rs55705857 locus accelerated tumor development in an Idh1*524-driven LGG mouse 
model from 472 to 172 days and increased penetrance from 30% to 75%. Our work reveals mechanisms 
of the heritable predisposition to lethal glioma in ~40% of LGG patients. 


he vast majority of cancer-related risk 
single-nucleotide polymorphisms (SNPs) 
identified by genome-wide association 
studies (GWASs) are located in non- 
coding regulatory regions (J, 2). These 
GWAS tag SNPs are usually in linkage dis- 


equilibrium with one or more causative var- 
iants that generally remain unknown. How 
such noncoding germline variants interact 
with acquired somatic mutations to facil- 
itate cancer development often remains elu- 
sive. We previously identified several glioma 
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susceptibility variants at 8q24.21, and rs55705857 
was the SNP with the largest odds ratio (OR). 
Conferring an approximately sixfold greater 
relative risk of developing /DH-mutant low- 
grade glioma (LGG) (3-5), rs55705857 is one 
of the highest reported inherited genetic as- 
sociations with cancer, comparable with in- 
herited BRCAI gene mutations and the risk 
of developing breast cancer or other familial 
glioma genes such as NFI/2, CDKN2A, or Tp53 
(Fig. 1A). Interestingly, rs55705857 is not asso- 
ciated with the risk of JDH-wild type (DH-WT) 
glioma or other cancers, including JDH-mutant 
acute myeloid leukemia (6-8). 

LGG are slow-growing brain tumors that 
eventually progress to aggressive glioblastoma. 
About 70% of LGG harbor transforming muta- 
tions in JDH17 or IDH2. Mutations usually affect 
codon 132 of IDH1 (R132H/C/S; R, Arg; H, His; 
C, Cys; S, Ser) or, less commonly, the homolo- 
gous codon 172 of IDH2 (R172K/W/G; K, Lys; 
W, Trp; G, Gly). Whereas WT IDH isozymes 
metabolize isocitrate to a-ketoglutarate (aKG), 
mutant IDH reduce aKG to the oncometabo- 
lite R-2-hydroxyglutarate (R-2HG), which alters 
the metabolic balance of affected cells (9, 10). 
Moreover, R-2HG is structurally similar to 07KG 
and competitively inhibits aKG-dependent 
dioxygenases such as 5-methylcytosine hy- 
droxylases and histone lysine demethylases 
(KDMs). This gives rise to the characteristic 
LGG CpG island methylation phenotype and 
perturbs histone modifications and alters ex- 
pression profiles of JDH-mutant gliomas (11-13). 
IDH-mutant LGGs are subdivided into two 
types on the basis of their co-occurring genomic 
alterations: molecular oligodendroglioma de- 
fined by co-deletion of chromosomal arms 1p 
and 19q (“codel”) together with TERT promoter 
mutation, and the more aggressive molecular 
astrocytoma characterized by inactivation of TP53 
together with ATRX mutations (“noncodel”) 
(5, 14, 15). 

In this study, we sought to reveal the mole- 
cular underpinnings for the specific and strong 
association between rs55705857 and IDH- 
mutant LGG as a basis for understanding 
LGG initiation and heritable risk of develop- 
ing glioma in ~40% of IDH1/2-mutant LGG 
patients carrying the rs55705857-G risk allele. 


Results 

Fine-mapping of inherited risk SNP variants at 
8924.21 

To clarify whether the rs55705857-G risk allele 
itself or other nearby SNPs were associated 
with LGG risk, we examined detailed haplotypes 
in genotyping data from 622 JDH-mutant LGG 
cases and 668 controls (6, 7). Identification of 
recombination events involving the risk haplo- 
type allowed mapping the boundaries of the 
minimal region containing the causative variant 
(Fig. 1B). The minimal causative region con- 
tained only two loci that previously met the 
criteria for genome-wide significance (P < 1.0 x 
10-*) (4): 18147958197 and rs55705857. Some 
subjects with the rs55705857-G risk allele did 
not have the rs147958197-C risk allele, but all 
subjects with the rs147958197-C risk allele also 
had the rs55705857-G risk allele, suggesting 
that rsl47958197 occurred on the haplotype 
containing the rs55705857-G allele. Notably, 
we did not observe a significant difference in 
the OR for developing glioma between pa- 
tients carrying only the rs55705857-G risk allele 
and those carrying both the rs147958197-C and 
1s55705857-G risk alleles (Fig. 1B). Results 
from sequencing six germline DNA samples 
containing a total of seven risk and five non- 
risk haplotypes did not identify any additional 
SNPs within the minimal causative region, 
thus identifying rs55705857-G as the likely causa- 
tive 8q24.21 risk variant for DH-mutant LGG. 


1s55705857 is located within an enhancer active 
in the brain and LGG 


1s55705857 resides in an intron of the long 
noncoding RNA CCDC26, raising the possibility 
that this locus has a gene regulatory function. 
Mining Roadmap and ENCODE data revealed 
enrichment of histone modifications consistent 
with enhancer activity (H3K27ac, H3K4mel, 
and deoxyribonuclease I hypersensitivity) at 
the rs55705857 locus in neuronal and melano- 
cyte lineages but not in any other lineages (fig. 
S1). Consistent with these data, examination 
of assay for transposase-accessible chromatin 
sequencing (ATAC-seq) data from The Cancer 
Genome Atlas (TCGA) (/6) revealed chromatin 
accessibility at rs55705857 almost exclusively 
in JDH-mutant LGG and cutaneous melanoma 


(fig. S2A), suggesting that rs55705857 lies within 
an enhancer that is active in very selective cell 
lineages. 

We next assessed epigenomic profiles 
of IDH-mutant human glioma. Chromatin- 
immunoprecipitation sequencing (ChIP-seq) 
revealed enrichment for the activating his- 
tone H3 lysine 27 acetylation (H3K27ac) and 
lysine 4 monomethylation (H3K4mel1) marks 
spanning rs55705857. This enhancer profile 
was more pronounced in JDH-mutant tumors 
than in JDH-WT tumors, with 3.05- and 1.58-fold 
greater signals for H3K27ac and H3K4mel, 
respectively (DiffBind; P = 5.81 x 10~’ and P = 
2.31 x 10°). Of note, active enhancer and pro- 
moter marks as inferred by the ChromHMM 
algorithm extended over 10 kb up- and down- 
stream of rs55705857, which was not observed 
in either JDH-WT tumors or brain gliosis sam- 
ples without tumors (Fig. 1C and fig. S1B). 
However, there were no significant differences 
in H3K27ac and H3K4mel in JDH-mutant tu- 
mors stratified by rs55705857 genotype (Fig. 1C). 
This suggests that IDH mutation, but not 
1s55705857 genotype, increases the enhancer 
activity of this locus in tumors. 


rs55705857-G risk allele enhances an 
LGG-specific transcriptional profile 


To delineate the functional impact of rs55705857, 
we performed expression quantitative trait loci 
(eQTL) analysis by correlating RNA sequenc- 
ing (RNA-seq) transcriptional profiles with 
1855705857 genotypes in 30 JDH-mutant codel, 
29 IDH-mutant noncodel, and 27 IDH-WT 
human gliomas. PVTI expression was signif- 
icantly lower and CCDC26 expression was 
significantly higher in JDH-mutant tumors 
than in IDH-WT tumors (P = 1.1 x 107° and 
5.9 x 10°, respectively), and MYC expression 
was significantly up-regulated in all tumors 
compared with gliosis 7DH-mutant P = 3.1 x 
10° and IDH-WT P = 7.8 x 10°). However, 
the rs55705857 allele did not appear to alter 
expression of genes in the region, which was 
corroborated in TCGA data (table S1). These 
results highlight that the transcriptional ef- 
fect of IDH mutations is quite substantial, 
whereas transcriptional impact of disease- 
associated polymorphisms may be subtle. 
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Fig. 1. rs55705857-G is the causal glioma risk variant at 8q24. (A) Fine- 
mapping of the 8q24 tag SNP allowed the discovery of rs5705857 with much 
ower allele frequency and much higher effect size than the originally discovered 
tag SNP. Of the 16 DH mutant risk SNPs listed, rs95705857 has an OR high 
enough to have an effect near that of familial inheritance glioma genes. (B) Fine- 
mapping of the minimal-risk haplotype region surrounding the /DH-mutant glioma 
isk SNP rs55705857. Subjects heterozygous for the risk haplotype and with 
meiotic crossovers disrupting the risk haplotype fall into four groups: two including 
the minimal overlap region (groups B and C) and two not including the 

minimal overlap region (groups A and D) (red, 55 cases; blue, 22 controls). 

(C) Gene transcripts, conservation between human and mouse, and chromatin 
status surrounding rs55705857 are displayed. The red vertical line denotes the 
ocation of rs95/05857. ATAC-seq data for the 8 IDH-WT and 13 /DH-mutant brain 
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tumors and skin cutaneous melanoma (SKCM) are aligned with DiffBind logs fold 
change for H3K27ac and H3K4mel when comparing IDH-mutant tumors against 
IDH-WT brain tumors. ChromHMM shows the predicted function of the genome 
surrounding rs95/05857 on the basis of the histone marks H3K36me3, H3K4mel, 
H3K27ac, and H3K4me3 in IDH-WT and /DH-mutant brain tumors as well as 
nontumor gliosis samples sorted by rs55/05857 nonrisk (A) and risk (G) alleles. 
(D) Comparison of GSEA results using 50 hallmark gene sets comparing 
IDH-mutant noncodel, [DH-mutant codel, or [DH-WT tumors versus gliosis and 
[DH-mutant noncodel rs55705857-G versus A allele tumors. Only gene sets with 
an FDR q < 0.05 in at least one comparison are included and colored in the 
heatmap; the darker reds and blues have an FDR q < 0.001. Bottom panel shows 
GSEA of the indicated glioma MYC target gene signatures across the different 
glioma subtypes. 
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To delineate more-subtle differences, we 
used gene set enrichment analysis (GSEA) to 
compare /DH-mutant noncodel LGG with 
gliosis or rs55705857-G with rs55705857-A 
IDH-mutant noncodel LGG. Both comparisons 
showed up-regulation of similar hallmark gene 
sets such as epithelial-to-mesenchymal transition 
(EMT), interleukin-2 (IL-2) and IL-6 signaling, 
inflammatory responses, hypoxia, G2M check- 
points, p53 pathway, interferon and tumor 
necrosis factor (TNF) signaling, and a strong 
down-regulation of genes involved in oxida- 
tive phosphorylation (Fig. 1D, fig. S2B, and 


table $2). This suggests that the rs55705857- 
G risk allele has a consequential functional 
role in augmenting the underlying biology of 
LGG. Given that rs55705857 is within the ~ 2-Mb 
region that is known to regulate expression of 
the MYC oncogene in several cancers (17), we 
analyzed MYC gene sets. Both hallmark MYC 
subsets (MYC targets V1 and V2) were signif- 
icantly up-regulated in JDH-mutant noncodel 
LGG compared with gliosis, but we failed to 
observe a significant difference between the 
1s55705857-G and rs55705857-A tumors (Fig. 
1D, fig. S2B, and table S2). 


To further explore a potential relation be- 
tween 1rs55705857 and MYC, we analyzed all 63 
previously described MYC target signatures. 
Given MYC’s pleiotropic and context-dependent 
effects, these 63 signatures show little overlap 
(fig. S3). Still, 25 of the 63 signatures showed 
a significant positive enrichment [false dis- 
covery rate (FDR) g < 0.05] in JDH-mutant 
noncodel rs55705857-G versus 1s55705857-A 
tumors (fig. S3 and table S3). To generate a 
glioma-specific MYC signature, we performed 
ChIP-seq analysis of two JDH-mutant and two 
IDH-WT glioma patient-derived xenografts 


Fig. 2. rs55705857 SNP resides 
in a brain-specific enhancer. 
(A) Schematic of rs55705857 
LacZ enhancer reporter construct. 
(B) Representative whole-mount 
images of LacZ-stained 
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rs95705857 nonrisk (left) and risk 
(right) enhancer reporter 
embryos. Black arrows denote 
LacZ staining found in both 
reporter mice, while red arrows 
indicate LacZ staining only found 
in risk reporter mice. (C) Sum- 
mary for enhancer activity of 

the nonrisk and risk allele. P-value 
by Fisher-Freeman-Halton 

exact test. n.s., not significant. 
(D) Representative immuno- 
fluorescent image of a sagittal 
section of an rs95/05857-G 

risk allele mCherry enhancer 
reporter embryo at E14.5 stained 
for mCherry and the radial glial 
marker Sox2. The pontomedullary 
hindbrain is shown and arrows 
depict mCherry/Sox2 double- 
positive cells. Scale bars, 50 wm. 
Similar staining patterns were 
observed in the ventricular zone 
of the forebrain. 
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Fig. 3. Idh1-mutant LGG mouse model. (A) Schematic of conditional alleles and 
CRISPR virus used to generate the LGG cohorts. (B) Survival of mice with the indicated 
genotype transduced with an sgRNA targeting Atrx or a scrambled control sgRNA 
(Scr). n = 201 mice; P < 0.0001, log-rank (Mantel-Cox) test. (C) Bar graph indicating 


percentage of phenotypes found in mice from (B) with the indicated genotype. 

(D) Representative hematoxylin and eosin (H&E) and immunohistochemistry (IHC) 
staining of the same tumor region within a Idhl®!924/*-Trp53°270%*-Atry-/-CasO-GFP 
brain using the indicated antibodies. Scale bars, 2.5 mm (left) and 50 um (right). 


(PDXs) using two validated MYC antibodies 
and integrated the results with RNA-seq data 
from the same PDXs delineating direct MYC 
target genes. We further developed glioma- 
specific gene sets by identifying genes whose 
expression showed positive correlation coef- 
ficients with MYC expression in JDH-mutant 
noncodel or IDH-WT TCGA gliomas (fig. S4A 
and table S3). As expected, this direct glioma 
MYC target gene signature showed significant 
(FDR gq < 0.05) enrichment in JDH-mutant 
and JIDH-WT glioma when compared with 
gliosis as well as in /DH-mutant noncodel 
1s55705857-G versus rs55705857-A tumors 
(Fig. 1D and figs. S3 and S4B). Interestingly, 
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the rs55705857-G tumors showed increased 
expression of MYC target genes associated 
with IDH-WT gliomas, indicating a transcrip- 
tional shift of 7DH-mutant rs55705857-G tu- 
mors toward the more aggressive IDH-WT 
gliomas (Fig. 1D). In line with this finding, we 
observed a similar shift of JDH-mutant non- 
codel rs55705857-G tumors toward a more ag- 
gressive [DH-WT-like profile across several 
GSEA hallmark signatures (Fig. 1D). Thus, our 
GSEA results indicate that the rs55705857-G risk 
allele is associated with a more aggressive tran- 
scriptional profile and significantly higher 
MYC activity, but we did not find a signifi- 
cant difference in MYC mRNA expression in 


1s55705857-G versus 1s55705857-A tumors [P = 
0.141; 2076 versus 1681 reads per kilobase per 
million mapped (RPKM); table S1]. 


rs55705857-G risk allele increases and broadens 
enhancer activity in a mouse reporter assay 


The extreme conservation of the rs55705857- 
A nonrisk allele and its surrounding sequence 
across all mammalian species, including mice 
and even platypus (4), prompted us to assess 
whether 1rs55705857 variants influence enhancer 
function in vivo. We generated mice carrying 
an enhancer construct comprised of the highly 
conserved 3225-bp-long human fragment, with 
the rs55705857-A nonrisk allele (hs1709A) or risk 
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Fig. 4. rs55705857 cooperates with /Idh1, Trp53, and Atrx mutations. 

(A) Survival of mice with the indicated genotype transduced with an sgRNA targeting 
Atrx or a scrambled control sgRNA (Scr). n = 123 mice; P < 0.0001, log-rank 
(Mantel-Cox) test. (B) Bar graph indicating percentage of phenotypes found in mice 


G allele (hs1709G) at the center of this fragment, 
followed by a minimal promoter and a lacZ 
reporter gene integrated into the HII safe 
harbor locus (Fig. 2A) (8). Both enhancer 
alleles were active in the cells of developing 
skin at embryonic day E11.5, consistent with 
a melanoblast staining pattern. The variant 
hs1709G had additional enhancer activity in 
the somite/rib area not observed for the hs1709A 
allele (Fig. 2, B and C). At E14.5, both enhancer 
alleles became active in the neural tube, fore- 
brain, and the ribs. Notably, the risk-associated 
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allele displayed an overall stronger enhancer 
activity in these structures and showed addi- 
tional activity in the midbrain (Fig. 2, B and 
C). To identify the specific cell types with an 
active rs55705857 locus, we next generated 
analogous hs1709A and hs1709G enhancer 
knock-in mice with an mCherry reporter gene. 
Co-staining with cell type-specific markers 
showed that all SOX2* and all GFAP” cells as 
well as a subset of OLIG2* cells were mCherry- 
positive, indicating that rs55705857 is active 
in all radial glial stem cells and a subset of 
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from (A). (€) Representative H&E and IHC of the same tumor region within a 
rs55700P/* dh 824+ Tr) 53™"-Atrx-“~: Cas9-GFP brain using the indicated antibodies. 
Scale bars, 2.5 mm (left) and 50 um (right). (D) Survival of Nod/Scid/y mice intracranial 
injected with rs557°* IdhI®!32/* T7534” 4:Cas9-GFP RIP cells. n = 17 mice. 


oligodendrocyte precursor cells (OPCs). We also 
observed co-staining of mCherry with neuronal 
markers such as SATB2, CTIP2, MAP2, and 
TUJ1 and some overlap with astrocyte marker 
$1008 (Fig. 2D and fig. S5). Together, these 
data indicate that the rs55705857-G risk allele 
directly influences strength and tissue spec- 
ificity of this developmental enhancer and 
that the rs55705857 locus functions as an 
active enhancer in the embryonic precursor 
cells that give rise to adult neuronal stem cells 
(NSCs) and OPCs. 
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Fig. 5. rs55705857 modulates OCT2 and OCT4 binding and regulates MYC Cas9-GFP RIP cells (n = 4). Immunoglobulin G (IgG) serves as a negative control 


expression. (A) The canonical OCT2/4 binding motif (top) and the rs55705857 + ~—and histone H3 as a positive control. P-value by two-tailed t test. (Bottom) 


nonrisk A allele (middle) and the rs55/05857 risk G allele (bottom) are Representative gel electrophoresis analysis of PCR amplicons from IgG, 

shown. (B) Sequence alignment of the human rs5570585/7 and its orthologous SOX2, OCT2, and H3 ChIP. (D) Enrichment of OCT4 at the mouse rs95/05857 
mouse locus highlighting conserved binding motifs for ASCLI/2, OCT2/4, locus. (Top) ChIP-qPCR using rs557°0/* idh1®182H/*. Trp534/4-Cas9-GFP RIP 
and SOX2/4/9. The nonrisk rs55705857-A allele is marked in red. Asterisks cells transfected with a V5-tagged OCT4 performed using an OCT4 and an 
indicate conserved amino acids. (C) Enrichment of OCT2 and SOX2 at mouse anti-V5 antibody. IgG serves as a negative control, and histone H3 as a positive 
rs55705857 locus. (Top) ChIP-qPCR using rs557°°P/*-IdhI324/+. Tp 53/4. control. P-value by two-tailed t test. (Bottom) Representative gel electrophoresis 
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analysis of PCR amplicons from IgG, H3, OCT4, and V5 ChIP. (E) The risk allele G 
of rs55705857 disrupts OCT binding. (Top) ChIP-qPCR showing enrichment 
of OCT2 at rs55705857 locus of human LGG cells heterozygous for the 
rs5/05857 risk allele. IgG-IP serves as a negative control and histone H3 as 
a positive control. P-value by two-tailed t test. (Bottom) Sanger sequencing 
chromatograms of the SNP region from input, histone H3, and OCT2 
ChiPed DNA. (F and G) Myc mRNA (F) and Myc protein (G) expression in 
rs95/05857 AA, AG, and GG NSCs and NSC-derived OPCs. P-value by two- 
tailed t test. (H) Genome architecture mapping (GAM) contact matrix of the 


Establishing a mouse model of 

Idh1®'324_mutant LGG 

To determine how the rs55705857 locus affects 
gliomagenesis, we established an JDH-mutant 
LGG model using conditional JdnyS0R!28/+ 
knock-in mice (Fig. 3A) (19). As expected, trans- 
ducing primary NSCs from these IdhySt8H/+ 
mice with an adenovirus expressing Cre resulted 
in R-2HG accumulation and drastically affected 
the tricarboxylic acid cycle, the glycolysis and 
glutaminolysis pathways, and the amounts of 
amino acids and nucleotides (fig. S6). We 
crossed the IdhI*®"4/* mouse to conditional 
TrpsgSe H+ mice, allowing for concomi- 
tant expression of p53°?”" (homologous to 
the human p53°?"%), p53?" functions in 
a dominant-negative manner, can have gain- 
of-function activity (20), and is the most prev- 
alent p53 mutation found in human LGG 
(fig. S7A). To enable CRISPR-Cas9-mediated 
somatic mutagenesis of any other LGG-associated 
genes, we further crossed these mice to the 
LSL-Cas9-GFP mice. To induce the expression 
of IDH1®) p53?" and Cas9-GFP, we used 
stereotactic injections to deliver Cre-expressing 
lentiviral particles to the NSCs residing in 
the lateral subventricular zone at postnatal 
day 0 (PO), which resulted in clonal induction 
of IDH1"" expression and accumulation of 
R-2HG (fig. S7, B to E). Next, we assessed the 
knock-out efficiency of CRISPR-Cas9 using 
either a dual fluorescence-based reporter assay 
or targeting endogenous genes such as Urod 
or Atrx, revealing a knock-out efficiency of 
between 60 and 85% (fig. S8). 

Next, we generated cohorts of R26-Cas9- 
GFP mice with different combinations of 
Idhi®'°" and Trp53 mutations and trans- 
duced them either with an LV-sgAtrv-Cre or 
a nontargeting, scrambled LV-sgScr-Cre virus. 
Starting at day 301, we observed sarcomas 
and lymphomas in JdhI-WT Trp538h ?08/+ 
mice, necessitating euthanasia (Fig. 3B). This 
is likely because the LOX-STOP-LOX (LSL) 
cassette makes the 7Trp53'S'?#/* mouse 
heterozygous for Trp53, which promotes spon- 
taneous sarcoma and lymphoma development 
(21, 22). None of the 16 IdhI-WT Trp538h0H/+ 
mice developed brain tumors. In contrast, 20% 
of the 40 Idhi®?4/*.Trp53*/* and 30% of the 
35 IdhT® 9". Tyn53’4* mice transduced with 
sgAtrx developed brain tumors in the cere- 
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bral cortex, cerebral striatum, or olfactory bulb, 
with a median survival of 463 days. An addi- 
tional 13 to 14% of these mice exhibited hyper- 
plastic lesions in the brain (Fig. 3, B to D, and 
fig. S9, A and B). Of note, induction of Idh1®”?" 
alone or in combination with p53"?”" put 
without targeting Atre did not initiate glioma 
formation over a 500-day period (Fig. 3C), con- 
sistent with previous reports (23, 24). 

Noncodel LGG is usually associated with 
loss-of-heterozygosity of chr17p encompass- 
ing the 7P53 locus, suggesting biallelic TP53 
inactivation (5, 14, 15). Therefore, we gener- 
ated cohorts of IdhI*/* and Idh1®™?"/* mice 
harboring either two Trp53" alleles (77 rp53t! so) 
or one 77rp53" and one Trps3S/®?# allele 
(Trps3ist 2708/1) About 10% of IdhI*/*; 
Trps3u" or Idhi*!*;Trps38e PH" trans- 
duced with LV-sgScr-Cre or LV-sgAtra-Cre 
developed brain tumors, as expected for mice 
with biallelic 7rp53 mutations (22) (Fig. 3C). 
Interestingly, with regard to IDH1*"""-driven 
tumorigenesis, we did not observe a difference 
in tumor prevalence between heterozygous 
p53?4 (Trp53®?4/*) complete loss of p53 
(Trps3™"), or p53®?4 with loss of the WT p53 
allele (Trp53S?0H/2) About 30% of a total 
of 65 mice in all those cohorts developed brain 
tumors with similar latency and histology when 
transduced with LV-sgAtrx-Cre, whereas most 
LV-sgScr-Cre transduced mice stayed tumor- 
free (Fig. 3C and fig. S9C). These data indicate 
that p53"? functions in a dominant-negative 
manner without apparent gain-of-function ef- 
fects in this mouse model and demonstrate 
that Idhi®"°" cooperates with Atrx and Trp53 
mutations in the development of LGG. 

All tumors expressed IDH1*"; harbored 
cells positive for KI67, OLIG2, NESTIN, 
GFAP, and PDGFRA; and exhibited a well- 
differentiated fibrillary and astrocytic histology 
and low apoptotic cell numbers, recapitulat- 
ing histopathological and molecular hallmarks 
of human LGG (Fig. 3D and fig. S9). Expres- 
sion profiling followed by GSEA comparing 
Tdht'?", Trps3?7"", and Atra compound 
mutant tumors to WT brain parenchyma 
revealed differentially expressed gene sets spe- 
cifically associated with EMT, IL-2, hypoxia, 
G2M checkpoint, p53 pathway, interferon, 
mammalian target of rapamycin (mTOR) sig- 
naling, TNF signaling, MYC, and oxidative 


chr15:61,500,000-64,500,000 genomic window showing strong interaction 
between Myc and the rs55705857 locus in mouse oligodendrocytes and 
their precursor cells (OLGs) in the somatosensory cortex. (I) Analysis of 
high-frequency interacting regions at the Myc locus in rs55705857 WT versus 
AG neuronal stem cells by 4C-seq. The heatmap color scale shows 
normalized median contact frequency. The black trendline shows the median 
contact frequency, and the shaded gray area indicates the 20th to 80th 
percentiles. The light-blue line marks the location of rs55705857, and the 
shaded gray box marks the location of Myc. 


phosphorylation (fig. S10A), reminiscent of 
human /DH-mutant noncodel LGG (Fig. 1D). 
Cluster analysis with human gliomas of sim- 
ilar subtype confirmed that the mouse tu- 
mors faithfully recapitulate the human disease 
(fig. SOB). 


Disruption of rs55705857 increases penetrance 
and decreases latency of Idh1"!324-driven glioma 


To assess the pathologic potential of rs55705857, 
we generated two mouse strains to evaluate the 
role of the highly orthologous mouse rs55707857 
locus in modulating gliomagenesis. One mouse 
line harbors an orthologous rs55705857 A—G 
substitution in conjunction with a 4-bp indel 
destroying the protospacer adjacent motif (PAM) 
site (denoted rs557! ) and another line harbors 
a66-bp deletion spanning the murine rs55705857 
locus (denoted rs557°°") (fig. S11, A to C). Both 
lines (called rs557™"" mice) were viable, fertile, 
and displayed no overt phenotype or abnor- 
mal brain histology, indicating that mutating 
the murine rs55705857 locus did not influence 
development. 

We crossed these rs557""" strains with 
Tdh{' SU 81828/+. Trp 53/"-LSL-Cas9-GFP mice 
and injected them with LV-sgScr-Cre or LV- 
sgAtrx-Cre. Both rs557™"" lines exhibited sig- 
nificantly increased penetrance and drastically 
decreased latency of tumor formation compared 
with rs557°/* mice (P < 0.0001) (Fig. 4, A and 
B). Whereas only 5% of the rs557™™ sJdhI*/*; 
Ti rp53! ~ animals injected with either sgScr or 
an sgAtrex developed brain tumors, 34 and 75% 
of rs557 "dh". Trp53™" animals injected 
with sgScr or sgAtrx, respectively, developed 
brain tumors. The median survival of rs557™"™"; 
TdhP®?8!*.Trp53V" animals injected with 
sgAtrx was 172 days for rs557 and 201 days 
for rs557°O"P compared with a median sur- 
vival of 472 days in rs557 control mice. Tumor 
location and histopathology were not altered 
compared with rs557™' (Fig. 4C). To further 
test whether rs55705857 SNP functions in a 
tumor cell-autonomous manner, we gener- 
ated lentivirus that expresses Cre as well as a 
single guide RNA (sgRNA) targeting the or- 
thologous mouse rs55705857 locus. Compared 
with control LV-sgScr-Cre, LV-sgrs557-Cre- 
injected [dhl®?8/*. Trp53 9 Atra™"|.Caso- 
GFP mice developed gliomas much more 
quickly and with a similar latency as rs557™™" 
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mice (P = 0.022), showing that the rs55705857 
locus functions in a tumor cell-autonomous 
manner (fig. S11D). In addition, orthotopically 
transplanting rs557°° dhl?" Trp53/— 
tumor cells (RIP cells) into recipient mice 
resulted in the formation of lethal gliomas 
(Fig. 4D and fig. S11E). Together, these data 
demonstrate that disruption of the rs55705857 
locus facilitates glioma development. 


The rs55705857-G risk allele disrupts an OCT 
transcription factor binding site 


As SNPs in regulatory regions can modulate 
transcription factor binding, we performed 
motif analysis, which revealed that rs55705857 
resides in an octamer-binding protein (OCT) 
transcription factor binding motif (Fig. 5A). 
Notably, the intragenomic replicates (IGR) al- 
gorithm predicted the risk-associated G allele to 
have a significantly lower binding intensity for 
OCT transcription factors compared with the 
reference A allele [~1.8-fold; ¢ test —logjo(P) = 
3.09]. In addition, the OCT motif is flanked by 
ahighly conserved SOX2/4/9 and an ASCL1/2 
motif (Fig. 5B), all of which play crucial roles 
in brain development (25-28) and glioma- 
genesis (29-31). 

Next, we set out to experimentally test 
whether the rs55705857-G risk allele affects 
binding of OCT transcription factors. We de- 
cided to focus on OCT2 and OCT4, which were 
expressed at low levels in our murine tumors, 
reminiscent of their low-level expression in 
human LGG and glioblastomas (fig. $12, A 
and B). While Jda1®!*?"-mutant RIP cells re- 
tained expression of OCT2 and SOX2, OCT4 
expression was lost upon culturing these 
mouse tumor cells (fig. S12, C and D). We thus 
performed ChIP of endogenous OCT2 and 
SOX2 but had to exogenously express OCT4 
(fig. S12D). Subsequent polymerase chain re- 
action (PCR) amplification of the rs55705857 
locus (ChIP-PCR) revealed that OCT2, OCT4, 
and SOX2 bound preferentially to the murine 
1s55705857-A nonrisk allele compared with 
the mutant rs557° allele (Fig. 5, Cand D). In 
line with our findings in human LGG, we also 
found that the murine rs55705857 locus is 
marked by H3K4mel and H3K27ac (fig. S12E). 

To extend these findings to humans, we per- 
formed OCT2 ChIP-PCR on human heterozy- 
gous rs55705857-A/G IDHI-mutant LGG cells. 
ChIP-PCR followed by Sanger sequencing of 
the PCR amplicon revealed that OCT2 indeed 
preferentially binds the A allele (Fig. 5E). Toge- 
ther, these data show that the rs55705857-G 
risk allele disrupts binding of OCT transcription 
factors such as OCT2/4. 


1s55705857 regulates the Myc pathway and 
physically interacts with the Myc locus 


We assessed whether the rs55705857 locus reg- 
ulates expression of nearby genes in primary 
NSC cultures isolated from homozygous rs557"S, 


76 7 OCTOBER 2022 + VOL 378 ISSUE 6615 


heterozygous rs557%°, and WT rs557“ lit- 
termate mice. Whereas neighboring genes such 
as Adcy8 or Poti did not show any expression 
differences, Myc and Asap1 showed increased 
expression in rs557"¢ and rs557°/¢ NSCs as 
well as in NSC-derived OPCs when compared 
with rs557“ WT cells (Fig. 5F and fig. S13A). 
rs557"¢ and rs557°/¢ NSCs and OPCs also 
exhibited increased MYC protein expression, 
and RNA-seq analysis showed increased ex- 
pression of MYC target genes in rs557"% and 
rs557°/¢ NSC cultures (Fig. 5G and fig. $13, B 
and C). 

In line with these data, we found elevated 
MYC protein expression in the NSC-enriched 
subventricular zone of 3-week-old rs557Y Tdhi/ 
p53/Atrx-mutant brains compared with litter- 
mate control rs557“/“ brains (fig. S14). Sim- 
ilarly, we found increased MYC expression in 
18557"% and rs557° [dhl/p53/Atrx-mutant brain 
tumors compared with rs557“ control tu- 
mors (fig. S15, A and B). RNA-seq followed by 
GSEA identified increased Myc mRNA as 
well as increased expression of gene sets 
specifically associated with MYC, interferon 
gamma and alpha responses, IL6/JAK/STAT 
responses, EMT, and hypoxia in rs557"¢ com- 
pared with rs557“ control tumors (fig. S15, C 
to E), reminiscent of the gene sets we found 
differentially regulated in human rs55705857- 
G LGGs (Fig. 1D). 

To further test whether the rs55705857 locus 
regulates expression of nearby genes in glioma 
cells, we first performed CRISPR interference 
(CRISPRi) targeting the rs55705857 locus in 
RIP cells, which led to reduced expression of 
Myc and other neighboring genes (fig. S16A). 
Next, we established isogenic RIP cells, where 
the remaining WT allele was also mutated by 
CRISPR-Cas9 (fig. SI6B). RIP cells with two 
mutant rs55705857 alleles compared with RIP 
cells harboring one WT 1rs55705857 allele ex- 
hibited modest but significant increased Myc 
expression (P = 0.029) (fig. S16C), indicating 
that the rs55705857-A allele functions to re- 
press Myc expression. Notably, forced expres- 
sion of Myc in IdhI/p53/Atra-mutant brains 
resulted in significantly accelerated tumor for- 
mation (P < 0.0001) (fig. S16, D to F) com- 
parable to rs557°¢ and rs557°°"? tumors (Fig. 
4A), indicating that Myc is a bona fide oncogene 
in JDH-mutant LGG. 

To test whether the rs55705857 locus regu- 
lates MYC expression in human cells, we first 
performed MYC reporter assays in 293T cells. 
Consistent with previous data (8), we found 
that the rs55705857-G risk allele had a stronger 
transactivating capability than the rs55705857-A. 
allele (fig. S17A). To elucidate whether OCT4 
binding to the rs55705857-A locus alters en- 
hancer activity, we concomitantly overexpressed 
OCT4, which resulted in a significantly decreased 
enhancer activity (P < 0.001) (fig. S17A), further 
supporting the notion that OCT transcription 


factor binding to the A allele represses MYC 
transactivation. Next, we generated several iso- 
genic human rs55705857-A/A and rs55705857- 
G/G induced pluripotent stem cell (iPSC) lines. 
Cerebral organoids established from these iso- 
genic iPSCs did not show any overt phenotype, 
but risk allele-containing organoids had in- 
creased MYC expression compared with the 
nonrisk organoids (fig. S17, B and C). 

To investigate a potential interaction of 
1s55705857 with the MYC promoter, we first 
mined genome architecture mapping data from 
murine brain (32). This revealed a strong inter- 
action between the rs55705857 locus and Myc 
in oligodendroglia [oligodendrocytes and their 
precursors (OLGs)] but not in terminally differ- 
entiated pyramidal glutamatergic neurons 
(PGNs), dopaminergic neurons (DNs), or mouse 
embryonic stem cells (mESCs) (Fig. 5H and fig. 
S18A). Consistent with our data showing that 
the rs55705857-A nonrisk allele suppresses Myc, 
the rs55705857-Myc interaction in OLGs was 
associated with closed chromatin and lack of 
Myc expression, whereas mESCs showed open 
chromatin and Myc expression (fig. S18A). To 
further support an rs55705857-G allele regulat- 
ing Myc expression, we used a circular chromo- 
some conformation capture assay (4C-seq), which 
revealed that IdhJ®°°"-mutant RIP tumor 
cells as well as rs557“/" mouse neuronal stem 
cells exhibit a stronger interaction between 
the Myc promoter and the rs55705857 locus 
than do rs557“““ control NSCs (Fig. 51 and fig. 
S18B). To extend these findings to humans, we 
analyzed Hi-C interaction data from healthy 
human hippocampus and dorsolateral pre- 
frontal cortex (33, 34), which showed inter- 
actions of the rs55705857 locus from both the 
1s55705857 and MYC perspective, including 
the MYC promoter, PVTI, and several other 
loci between the two regions (fig. S19). Toge- 
ther, these data support a model where the 
1855705857-G allele abrogates OCT2/4 binding 
within a conserved enhancer element, allow- 
ing it to interact with MYC promoter and up- 
regulate MYC expression. 


Discussion 


By comprehensively profiling a large cohort 
of LGG, we found that rs55705857 itself is the 
causal risk variant and lies within a conserved 
OCT transcription factor binding motif with- 
in a brain-specific enhancer, which is hyper- 
activated in JDH-mutant LGGs. It is well known 
that 2-HG produced by mutant IDH compet- 
itively inhibits histone lysine demethylases 
such as KDM6A/B, resulting in regional varia- 
tion in histone modification, including areas 
of decreased and increased H3K27ac and 
H3K4mel and enhancer activity (1/-13). The 
region surrounding rs55705857 is clearly an 
area of increased enhancer activity specifically 
in JDH-mutant tumors. The hyperactive chro- 
matin status combined with the tissue specificity 
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of this enhancer thus explains the cooperativity 
between mutant IDH1/2 and rs55705857 and 
why 1s55705857-G is associated specifically with 
IDH1/2-mutant glioma but not other brain can- 
cers (fig. S20). 

We found that the rs55705857 locus func- 
tions as an enhancer not only in the brain 
but also in melanocytes. Five percent of mela- 
nomas harbor JDH1""” hotspot mutations. An 
increased risk of melanoma in patients with 
glioma is well documented and is thought to 
result from common genetic predispositions. 
Germline deletion of the INK4 locus and alter- 
ations in telomere maintenance are associated 
with the melanoma-astrocytoma syndrome 
(35-38). It will be interesting to assess whether 
the rs55705857-G risk allele also increases sus- 
ceptibility to melanoma. 

Mechanistically, we show that the rs55705857- 
G risk allele abrogates OCT2/4: binding to this 
enhancer and exhibits increased physical inter- 
actions with the MYC promoter and increased 
MYC transcription, indicating that OCT2/4 
binding the nonrisk rs55705857-A locus restricts 
MYC expression (fig. S20). In addition to its well- 
known functions in activating transcription, 
OCT4 has been shown to act as a repressor of 
lineage-specific transcription during early em- 
bryonic development (39, 40). OCT2 is also a 
recognized transcriptional repressor and known 
to regulate neuronal differentiation (47). Given 
that all eight OCT transcription factor family 
members share the exact same DNA binding 
motif and are expressed in LGG, it is likely that 
other OCT transcription factors also interact 
with the rs55795857-A locus to regulate MYC. 
While we showed that the rs55705857-G allele 
enhances the expression of MYC and MYC 
targets, rs55705857 may also interact with 
genes other than MYCin cis or trans (such as 
ASAPI) and act through them in modulating 
tumor growth. In fact, the GSEA in human 
LGG demonstrates that the rs55705857-G risk 
allele reinforces the biological pathways driving 
gliomagenesis, whereas the association between 
the rs55705857-G allele and MYC expression 
in human LGG was relatively weak (P = 0.141). 
This observation may indicate that the effect 
of rs55705857-G allele on MYC may be more 
prevalent during tumor initiation and less 
pronounced in clinically overt tumors. In ad- 
dition, we were only able to assess MYC expres- 
sion in 55 LGG patients with known rs55705857 
status, clearly indicating that future studies with 
bigger patient cohorts should be performed. 

To model JDH7-mutant glioma, we used con- 
ditional Idh1®™*" knock-in mice and generated 
tumors by injecting Cre into newborn mice, 
which suggests that the initiation of human 
LGG can occur very early in life and is consistent 
with the diagnosis of JDH-mutant glioma in 
children starting at age of 14 years (42). Given 
the slow growth of LGG, it is conceivable that 
these tumors may indeed initiate undetected 
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in early childhood. In line with previous data 
(23, 24, 43), Idh1®"°" alone is not sufficient to 
induce gliomagenesis in mice. This is now 
supported by the findings of Ganz et al., which 
show that clonal oncogenic JDH7 mutations 
can be found in healthy human brains (44). 
Even combining Idhi®'*°" with the other 
strong LGG driver mutations such as Trp53 
and Atrz loss merely led to a low-penetrant 
tumor phenotype with long latency. Thus, we 
hypothesized that certain noncoding germline 
susceptibility variants such as the rs55705857 
SNP may increase penetrance and accelerate 
cancer development. 

To assess the importance of the rs55705857 
SNP, we generated mouse lines with targeted 
CRISPR-Cas9 mutagenesis of the orthologous 
murine rs55705857 locus. Genetic ablation of 
66 base pairs encompassing the region orthol- 
ogous to rs55705857 (thereby removing the 
OCT binding motif), knocking in the G risk 
allele (together with a 4-bp insertion destroy- 
ing the PAM site) or somatic CRISPR-Cas9- 
mediated rs55705857 mutagenesis drastically 
decreased latency and increased tumor pene- 
trance in the context of mutant Jdh7®!??", 
Trp53, and Atrx. Although these strains do 
not perfectly mimic the SNP, as it is technically 
very challenging to generate a “scarless” A>G 
1-bp knock-in allele, these data clearly show 
that the locus is important for gliomagenesis. 
Together with the fine-mapping of the risk 
allele in human LGG, the differential affinity 
of the risk allele for OCT2/4 transcription 
factors, the two rs55705857-A versus -G mouse 
reporter strains, and the rs55705857 G/G knock- 
in cerebral organoid data, our results strongly 
suggest that rs55705857 is functional and the 
causative allele. 

Although several other germline SNPs are 
associated with the development of LGG, 
rs55705857 confers by far the greatest risk 
above and beyond combinations of the other 
LGG risk loci (3-5, 45, 46). However, the 
molecular basis for the rs55705857-LGG asso- 
ciation was unknown. Here, we reveal a func- 
tional link between the rs55705857 germline 
variants, OCT-mediated regulation of MYC ex- 
pression, and the development of JDH-mutant 
LGG. Our model helps to further understand 
the biology of JDH-mutant gliomas and ex- 
plains much of the inherited risk of devel- 
oping these tumors. Additionally, we have 
developed a faithful preclinical model that can 
be used to assess potential therapeutic avenues 
for DH-mutant glioma. 


REFERENCES AND NOTES 

1. A. Sud, B. Kinnersley, R. S. Houlston, Nat. Rev. Cancer 17, 
692-704 (2017). 

2. D. J. Schaid, W. Chen, N. B. Larson, Nat. Rev. Genet. 19, 
491-504 (2018). 

3. R. B. Jenkins et al., Cancer Genet. 204, 13-18 (2011). 

4. R.B. Jenkins et al., Nat. Genet. 44, 1122-1125 (2012). 

5. J. E. Eckel-Passow et al., N. Engl. J. Med. 372, 2499-2508 
(2015). 


6. B.S. Melin et al., Nat. Genet. 49, 789-794 (2017). 

7. J. E. Eckel-Passow et al., Neuro-Oncology 22, 1602-1613 
(2020). 

8. Y. Oktay et al., Sci. Rep. 6, 27569 (2016). 

9. L. Dang et al., Nature 462, 739-744 (2009). 

0. F. Fack et al., EMBO Mol. Med. 9, 1681-1695 (2017). 

1. C. Lu et al., Nature 483, 474-478 (2012). 

2. R. A. Cairns, T. W. Mak, Cancer Discov. 3, 730-741 
(2013). 

3. S. Chang, S. Yim, H. Park, Exp. Mol. Med. 51, 1-17 (2019). 

4. H. Suzuki et al., Nat. Genet. 47, 458-468 (2015). 

5. D. N. Louis et al., Acta Neuropathol. 131, 803-820 (2016). 

6. M. R. Corces et al., Science 362, eaav1898 (2018). 

7. O. Lancho, D. Herranz, Trends Cancer 4, 810-822 (2018). 

8. E. Z. Kvon et al., Cell 180, 1262-1271.e15 (2020). 

9. M. Sasaki et al., Nature 488, 656-659 (2012). 

20. Y. Stein, R. Aloni-Grinstein, V. Rotter, Carcinogenesis 41, 
1635-1647 (2020). 

21. T. Jacks et al., Curr. Biol. 4, 1-7 (1994). 


22. K. P. Olive et al., Cell 119, 847-860 (2004). 

23. C. Bardella et al., Cancer Cell 30, 578-594 (2016). 

24. C. J. Pirozzi et al., Mol. Cancer Res. 15, 507-520 (2017). 

25. J. Hsieh, Genes Dev. 26, 1010-1021 (2012). 

26. A. L. Ferri et al., Development 131, 3805-3819 (2004). 

27. H. Suh et al., Cell Stem Cell 1, 515-528 (2007). 

28. R. L. Reeve, S. Z. Yammine, C. M. Morshead, D. van der Kooy, 
Stem Cells 35, 2071-2082 (2017). 

29. N. |. Park et al., Cell Stem Cell 21, 411 (2017). 

30. H. Bulstrode et al., Genes Dev. 31, 757-773 (2017). 

31. H. Ikushima et al., J. Biol. Chem. 286, 41434-41441 (2011). 


32. W. Winick-Ng et al., Nature 599, 684-691 (2021). 

33. D. Yang et al., Nucleic Acids Res. 46, D52-D57 (2018). 

34. |. Jung et al., Nat. Genet. 51, 1442-1449 (2019). 

35. P. M. Scarbrough, |. Akushevich, M. Wrensch, D. Il'yasova, 

Ann. Epidemiol. 24, 469-474 (2014). 

36. A. K. Chan et al., Clin. Neuropathol. 36, 213-221 (2017). 

37. P. J. Killela et al., Proc. Natl. Acad. Sci. U.S.A. 110, 6021-6026 

(2013). 

38. M. N. Bainbridge et al., J. Natl. Cancer Inst. 107, 384 (2014). 

39. H. Niwa et al., Cell 123, 917-929 (2005). 

40. G. J. Pan, Z. Y. Chang, H. R. Schdler, D. Pei, Cell Res. 12, 
321-329 (2002). 

41. E. Theodorou et al., Genes Dev. 23, 575-588 (2009). 

42. S. Ryall, U. Tabori, C. Hawkins, Acta Neuropathol. Commun. 8, 
30 (2020). 

43. M. Sasaki et al., Genes Dev. 26, 2038-2049 (2012). 

44. J. Ganz et al., Cancer Discov. 12, 172-185 (2022). 

45. K. Labreche et al., Acta Neuropathol. 135, 743-755 (2018). 

46. M. Wrensch et al., Nat. Genet. 41, 905-908 (2009). 


ACKNOWLEDGMENTS 


We thank all members of our laboratories as well as The Centre for 
Phenogenomics (TCP) for helpful discussions. We thank the 

staff of the Epigenomics Development Laboratory and Recharge 
Center (EDL) at Mayo Clinic for carrying out the epigenomic 
assays. The EDL is supported in part by the Mayo Clinic Center for 
ndividualized Medicine. We acknowledge study participants, the 
clinicians, and research staff at the participating medical 

centers and the University of California, San Francisco (UCSF) 
eurosurgery and Mayo NeuroOncology tissue banks. We also 
hank the following colleagues at the Mayo Clinic and UCSF who 
acilitated subject recruitment and collection and curation of 
subject data and preparation of reagents: M. Bublitz, J. Buckner, 
T. Burns, A. Caron, C. Giannini, C. Halder, B. O'Neill, |. Parney, 

C. Praska, A. Ragunathan, G. Sarkar, J. Sarkaria, M. Berger, P. Bracci, 
S. Chang, H. Hansen, L. McCoy, A. Molinaro, M. Prados, T. Rice, 

T. Tihan, and J. Wiemels. We thank L. Penn and C. Redel for helping 
design the MYC ChIP-seq experiments. Atrx" mice were kindly 
provided by D. Picketts. The Idhl®°?4 antibody (#456R-31) was a 
generous donation of MilliporeSigma. We especially thank the 
many other neurosurgeons at Mayo who collected, over several 
years, the tissues used in this study. Funding: D.S. is a recipient of 
a Career Development Award from the HFSP (CDAO0080/2015). 
S.K.L. is recipient of a Canadian Breast Cancer Fellowship (BC-F- 
16#31919). This work was conducted with support of the Ontario 
Institute for Cancer Research through funding provided by the 
Government of Ontario and a Brain Tumour Foundation of Canada 
Brain Tumour Research Grant. Work at UCSF was supported by 
the National Institutes of Health (grants RO1CA52689, P50CA097257, 
RO1CA139020, RO1CA119215, and RO1CA207360) and by the loglio 
Collective, the Stanley D. Lewis and Virginia S. Lewis Endowed 
Chair in Brain Tumor Research (M.W.), and the Robert Magnin 
Newman Endowed Chair in Neuro-oncology. R.B.J. and the work at 


7 OCTOBER 2022 » VOL 378 ISSUE 6615 77 


RESEARCH | RESEARCH ARTICLES 


Mayo was supported by National Cancer Institute (NCI) grants 
CA230712, P50 CA108961, and CA139020; the National Brain 
Tumor Society; the loglio Collective; the Mayo Clinic; and the 
Ting Tsung and Wei Fong Chao Foundation. A.Ab. and A.Pa. were 
supported by NCI grant U24CA220242 and the Mayo Center for 
Individualized Medicine. Work at Lawrence Berkeley National 
Laboratory was supported by National Institutes of Health grants 
ROIHGO03988 (to L.A.P.) and ROOHGO09682 (to E.Z.K.) and was 
performed under US Department of Energy Contract DE-ACO2- 
05CH11231 to the University of California. A.Po. acknowledges 
support from the Helmholtz Association (Germany). A.Po. and 
W.W.-N. were supported by the Deutsche Forschungsgemeinschaft 
(DFG; German Research Foundation) under Germany's Excellence 
Strategy EXC-2049-390688087. Author contributions: C.Y. 
performed all mouse experiments. K.L.D., together with T.M.K., 
performed all the analysis of the human glioma samples. 

R.T. performed all mouse ChIP-PCR and reverse transcription- 
PCR experiments; W.W.-N. and A.Po. analyzed the GAM data; 
M.Li. helped with 4C experiments; J.P. helped with metabolomics; 
S.B.D.L. performed the 293T reporter assays; J.J.H. and J.T.G. 
helped with MYC reporter assays; J.W.L.B., C.L., and P.G.M. helped 
0 assess intrachromosomal interactions; T.M.K. performed the 
ine-mapping analyses of the 8q24 region; and A.Al. and T.M.K. 
performed the Mayo and TCGA GSEA analyses. P.A.D. and M.L.K. 
performed all the Mayo Clinic and TCGA human RNA-seq 

studies and statistical analyses; A.M., M.G., and K.C. performed 
bioinformatic mouse analysis; L.W., A.Ab., and A.Pa. helped with the 
ATAC-seq, ChIP-seq, and RNA-seq analyses; J.B., A.M., D.T., and J.Wr. 
helped with mouse experiments; S.K.L. and K.N.A.-Z. helped with 
CRISPR technologies; K.A.M., J.F. and B.C. helped with establishing 
primary cell lines; P.M., M.Lu., M.A., and H.H.H. performed IGR 
and motif analysis; L.Z. and A.E. performed all histology; J.W.D. 
supervised metabolomics; M.D.W. supervised 4C experiments; 
T.M. provided the Idh1224-mutant mice; D.H.L. supervised the 
collection of the Mayo Clinic specimens and clinical data; M.W. and 
J.Wi. provided the UCSF 8q24 case and control genotyping data; 
L.A.P., D.E.D., and A.V. supervised the reporter knock-in mice 
experiments; M.D.T. and P.D. provided cell lines and experimental 
guidance; D.J.M. helped with design of the MYC ChIP-seq 
experiments; G.Z. and L.J. performed R/S-2HG MS; J.E.E.-P. 
supervised the statistical analysis of the Mayo Clinic human data; 
L.A. supervised organoid experiments; C.M.I. performed histology 
analysis; and E.Z.K., with E.W.H. and S.J., performed and analyzed 
he reporter knock-in mice experiments. D.S. and R.B.J. designed 
he experiments and coordinated the project and, together with 
C.Y. and K.L.D., wrote the manuscript. Competing interests: A.Po. 
holds a patent on GAM: A. Pombo, P. A. W. Edwards, M. Nicodemi, 
A. Scialdone, and R. A. Beagrie, Genome architecture mapping, 
nternational Patent PCT/EP2015/079413 (2015). D.S. is working as 
a consultant for Tango Therapeutics outside of the submitted 
work. Data and materials availability: The UCSF and Mayo Clinic 
genotyping data are available through dbGap accession numbers 
phs001497.v2.pl and phs003041.v1.pl: https://www.ncbi.nim.nih. 
gov/projects/gap/cgi-bin/study.cgi?study_id=phs001497.v2.p1 
and https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi? 
study_id=phsO03041.vI pl, respectively. All ChiP-seq and RNA- 
seq for human glioma is available at NCBI Gene Expression 
Omnibus (GEO) under accession no. GSE167806: https://www. 
nebi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE167806. All RNA-seq 
for mouse glioma and MYC ChIP-seq data of human glioma PDX is 
available at NCBI GEO under accession no. GSE172391: https:// 
www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE172391. 4C-seq 
data relating to Fig. 5H is available at NCBI GEO under accession 
no. GSE172390: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi? 
acc=GSE172390 License information: Copyright © 2022 the 
authors, some rights reserved; exclusive licensee American 
Association for the Advancement of Science. No claim to original 
US government works. https://www.science.org/about/science- 


licenses-journal-article-reuse 


SUPPLEMENTARY MATERIALS 


science.org/doi/10.1126/science.abj2890 
Materials and Methods —s—=—CSsS 
Figs. Sl to S21 

Tables S1 to S4 

References (47-81) 

MDAR Reproducibility Checklist 


Submitted 4 May 2021; resubmitted 22 June 2022 
Accepted 8 September 2022 
10.1126/science.abj2890 


78 7 OCTOBER 2022 » VOL 378 ISSUE 6615 


METALLURGY 


Machine learning-enabled high-entropy 


alloy discovery 


Ziyuan Rao’, Po-Yen Tung”, Ruiwen Xie®, Ye Wei’*, Hongbin Zhang’, Alberto Ferrari’, T.P.C. Klaver’, 
Fritz Kérmann"™, Prithiv Thoudden Sukumar’, Alisson Kwiatkowski da Silva’, Yao Chen*°, 
Zhiming Li*®, Dirk Ponge’, Jorg Neugebauer’, Oliver Gutfleisch’?, Stefan Bauer’, Dierk Raabe** 


High-entropy alloys are solid solutions of multiple principal elements that are capable of reaching 
composition and property regimes inaccessible for dilute materials. Discovering those with valuable 
properties, however, too often relies on serendipity, because thermodynamic alloy design rules alone often 
fail in high-dimensional composition spaces. We propose an active learning strategy to accelerate the 
design of high-entropy Invar alloys in a practically infinite compositional space based on very sparse data. 
Our approach works as a closed-loop, integrating machine learning with density-functional theory, 
thermodynamic calculations, and experiments. After processing and characterizing 17 new alloys out of 
millions of possible compositions, we identified two high-entropy Invar alloys with extremely low thermal 
expansion coefficients around 2 x 10°° per degree kelvin at 300 kelvin. We believe this to be a suitable 
pathway for the fast and automated discovery of high-entropy alloys with optimal thermal, magnetic, 


and electrical properties. 


lloy design refers to a knowledge-guided 

approach to the development of high- 

performance materials. The strategy was 

established in the Bronze Age and has 

undergone further developments since 
that time. Alloy design is the basis for the 
development of different materials that en- 
able technological progress. Several thousand 
metallic alloys have been developed so far 
that serve in engineering applications. The 
first essential alloy groups developed, such 
as bronze and steel, are all based on one main 
element that forms the matrix of the material. 
Over time, alloys with a higher number of al- 
loying elements in larger fractions, such as 
austenitic stainless steels, have been devel- 
oped. Today, with the development of high- 
entropy alloys (HEAs), we have reached a 
stage where multiple elements are used in 
similar fractions (/, 2). Considering only the 
most used elements of the periodic table, this 
spans a composition space of at least 10°° alloy 
variants, a space so large that it cannot be 
managed by conventional alloy design methods 
(3). These conventional methods for designing 
alloys, which have been applied to small sub- 
spaces of the HEA composition realm, include 
calculation of phase diagrams (CALPHAD) and 
density-functional theory (DFT) (4-6). However, 
CALPHAD provides equilibrium-phase diag- 
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rams only, and DFT is computationally costly 
and cannot be readily applied to higher tem- 
peratures and disordered alloys (5, 7). Like- 
wise, combinatorial experiments (8) are very 
labor intensive and only cover the limited com- 
position space of HEAs. 

Because of these methodological limitations 
to finding materials with promising functional 
and mechanical features, we present a differ- 
ent approach to accelerating the discovery of 
HEAs. We based our approach on the use of 
machine learning (ML) techniques, with a 
focus on probabilistic models and artificial 
neural networks. Limited by the amount of 
available composition-property data, conven- 
tional ML approaches in alloy design have to 
predominantly rely on simulation data, often 
with only limited experimental validation 
(9, 10). As the experimental microstructure 
database continues to expand, ML obtains 
higher accuracy in predicting the phase or 
microstructure of materials (/7). However, 
the direct composition-property prediction is 
still elusive because of the comparably small 
databases and the human bias in feature se- 
lection. Recently, active learning has emerged 
as an alternative choice for functional mate- 
rials discovery (12). Active learning is a subfield 
of ML in which surrogate models iteratively 
select unseen data points that are most in- 
formative to improve the predictive power of 
the models (13). In this approach, the next set 
of experiments is guided by the previous model 
trained based upon the results seen so far, 
yielding data points that will again be used 
iteratively for updating the model. Active learning 
has the potential to reduce the computational 
costs of alloy design and to both incorporate 
and guide experimental data and routines. 
However, active learning approaches to guid- 
ing the experimental discovery of materials 
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Fig. 1. Approach overview. We developed an active learning framework for 
the targeted composition design and discovery of HEAs, which combines ML 
models, DFT calculations, thermodynamic simulations, and experimental 
feedback. First, the promising candidates are generated under the HEA-GAD 
framework consisting of two primary steps: (i) an autoencoder for composition 
generation and (ii) stochastic sampling for composition selection. Second, 


have relied on simple surrogate models and 
Bayesian optimization methods, which are 
limited to low-dimensional data, thus showing 
property improvements only after many iter- 
ations (14, 15). 

To overcome these obstacles, we propose an 
active learning framework for the composition 
discovery of HEAs that is efficient for very 
sparse experimental datasets. The approach 
comprises ML-based techniques, DFT, mean- 
field thermodynamic calculations, and experi- 
ments. We focused on the design of high-entropy 
Invar alloys with a low thermal expansion 
coefficient (TEC) for several reasons: (i) a high 
demand exists for different types of Invar al- 
loys to serve emerging markets for the transport 
of liquid hydrogen, ammonia, and natural gas; 
(ii) the mechanical properties of the original 
Feg3.5Niz6.5 (wt %) alloy for which Charles 
Edouard Guillaume received the 1920 physics 
Nobel Prize leave room for improvement; (iii) 
alternative Invar alloys (e.g., intermetallic, 
amorphous, or antiferromagnetic Invar com- 
pounds) come at forbiddingly high alloy costs 
and/or poor ductility (16, 17); (iv) although a 
few HEAs have the potential to fill this gap 
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(18-20), the lowest TEC (~10 x 10°° K™}) of 
HEAs reported in the literature exceeds the 
value of the original Fegs ;Nisg.5 (wt %) alloy 
(~1.6 x 10°° K4) (19); and (v) our active learn- 
ing framework mainly considers compositional 
information instead of the alloy manufacturing 
process, which makes the Invar effect an ideal 
target because these alloys are mostly determined 
by composition and less by processing (6, 19) 
(see fig. S1 and table S1 for more background). 


Results and discussion 
Generative alloy design 


The active learning framework includes three 
main steps: targeted composition generation, 
physics-informed screening, and experimental 
feedback (Fig. 1). Considering the large num- 
ber of possible composition combinations of 
HEAs and the small experimental datasets 
(699 compositions; fig. S2), the challenge is to 
directly sample new compositions with the de- 
sired properties. Therefore, we developed an 
HEA generative alloy design (HEA-GAD) ap- 
proach that is based on a generative model 
(GM) (2D. First, the HEA-GAD uses GM, 
mathematical modeling, and sampling to per- 


~1,000 
candidates 


Physical properties 


DFT and Thermodynamic 
calculation 


a a a 


Two-step ensemble 
regression model 


10-30 
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the selected candidates from the HEA-GAD are further processed by the TERM 
framework, which includes two ensemble models composed of multilayer 
perceptrons and gradient-boosting decision trees. In the last step, the most 
promising compositions are selected by a ranking-based policy. The top three 
candidates are experimentally measured and fed back to the database. The 
iteration is repeated until the discovery of Invar alloys. 


form a large-scale search of potential Invar 
alloys. GM learns an efficient and effective re- 
presentation of the high-dimension data, which 
not only provides direct data visual represen- 
tation, but also converts the search in high- 
dimensional design spaces to those of lower 
dimensionality (22). Different GMs are compared 
and analyzed on the basis of the evaluation 
metrics. The results show that the Wasserstein 
autoencoder (WAE) architecture performs bet- 
ter than other models with similar architec- 
tures (27) (figs. S3 and S4). The encoder takes 
compositions of alloys as the input and learns 
to compress them down to low-dimensional 
representations, and the decoder can then act 
as a generator for producing alloy compositions 
given the learned continuous latent 2 repre- 
sentation. Although WAE is trained with only 
compositional information of alloys, it may im- 
plicitly include information on composition- 
related properties, which makes the latent space 
physically meaningful and informative. In our 
case, Invar alloys show extremely low TEC 
(hereafter used to refer to the TEC around 
room temperature unless otherwise specified) 
values, and the composition-TEC relation obeys 
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specific physical laws. Subsequently, HEA-GAD 
uses the Gaussian mixture model (GMM) and 
Markov chain Monte Carlo (MCMC) sampling 
(23, 24) to perform a large-scale search for the 
Invar compositions generated from WAE latent 
representation. 


Two-stage ensemble regression 


Next, we use the two-stage ensemble regres- 
sion model (TERM) to further investigate the 
TEC of the HEA-GAD-generated alloy compo- 
sitions. The first stage concerns composition- 
based regression models aiming at fast and 
large-scale composition inference. Then, the 
top ~1000 results with potentially low TEC 
from the HEA-GAD model are screened and 
enter the second-stage model, where DFT and 
thermodynamic calculations are included as 
part of the input, making it a physics-informed 
model (table S4). In the following section, we 
demonstrate that incorporating the physical 
inputs does increase the model accuracy. To 
increase the robustness of TERM without 
sacrificing the prediction accuracy, TERM 
taps into the advantages of the multilayer 
perceptron (25-27) and gradient-boosting 
decision tree approaches (28-30) by combin- 


Fig. 2. First and last (sixth) itera- 
tions of the HEA-GAD generation. A 
(A and B) WAE latent space 

and GMM-modeled density of the 
first iteration. (© and D) WAE 
latent space and GMM-modeled 
density of the last iteration. The 
WAE latent space distribution 

of the different compositions is 
marked with different symbols. 
The colors of the data points in 
the latent space denote their 2 
corresponding TEC. The GMM 

shows the probabilistic density 

in the latent space. The new 

candidates proposed by the first 

stage of the TERM are marked 

by crosses, and the new Cc 
compositions proposed by the 
second stage of the TERM 
are marked by circles. The 
learned latent spaces are 
informative of the TEC 

of the HEAs. 
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ing both into a single ensemble (37). Based on 
prediction and uncertainty, the exploration 
and exploitation strategy is used to adaptively 
guide the discovery of desirable compositions 
(31). Exploration prefers the composition with 
higher uncertainty (curiosity), whereas ex- 
ploitation favors the composition with lower 
predicted TEC (perceived usefulness). Such a 
baseline strategy is premised on the model’s 
ability to generalize beyond the known data, 
which is, however, often hampered by the highly 
nonlinear nature of the composition-property 
relation and sparsity of the available dataset. 
To overcome this issue, we designed a rank- 
order strategy that allows predictions to be 
rearranged and ranked in a specific order 
(32, 33). This strategy is particularly advanta- 
geous when the underlying distribution of 
the data is largely unknown. The rank-based 
strategy ensures that the candidate selection 
is less affected by model inaccuracy and pro- 
vides a systematic way to combine model 
prediction and uncertainty (37). Finally, the 
TEC values of the top three selected candidate 
materials are experimentally determined by 
the physical properties measurement system. 
These experimental results then augment the 
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training database for the next active learning 
iteration. 


Compositional latent space distribution 


We produced a large benchmark dataset with 
699 data points of Invar alloys mainly from 
former publications (fig. S2 and table $3) (34-39). 
Then, on the basis of the HEA-GAD-TERM 
framework proposed above, we performed six 
iterations and cast 18 alloys including 17 new 
alloys and one Feg3;Nig6.5 (wt %) classic Invar 
alloy as a reference alloy. Because of the data 
imbalance (figs. S5 and S6), the discovery of 
FeNiCoCrCu HEAs is much more difficult 
than the discovery of FeNiCoCr HEAs. For this 
reason, we focused on the design of FeNiCoCr 
HEAs for the first three iterations and on 
FeNiCoCrCu HEAs for the last three iterations. 
We show the WAE latent space and GMM- 
modeled two-dimensional probability density 
of the first iteration in Fig. 2, A and B. The la- 
tent space yields certain islands that indicate 
the compositional differences. For example, 
the HEAs tend to stay in the middle, whereas the 
binary and ternary alloys tend to stay in the 
edges of the latent space. Also, a smooth tran- 
sition among the Fe-Ni, Ni-Co binary alloys 
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Fig. 3. Importance of the physics-informed descriptors. (A to C) Correlation between the proposed descriptor w,/T, and the experimental TEC. (D) Schematic 
model of the Masumoto empirical rule for discovering Invar alloys. (E) Comparison of training and testing history with and without use of the descriptor w</T.. Both 
the final training and testing errors decrease after considering the physics-informed descriptors; for example, the testing error decreases from 19 to 14%. 


Table 1. Compositions and TEC of the HEAs designed in this work.* 


Alloys Iteration Fe Ni Co Cr 
(wt %) (wt %) (wt %) (wt %) 
Al Ist Bor 23.9 16.7 42 


Cu Predicted TEC Predicted uncertainty Experimental TEC 
(wt %) (*1076/K) (*107/K) (*1076/K) 
0 3.41 1.29 7.54 


12.6 37.7 7.3 


0 4.58 4.09 


Bl Ath 40 6.9 18) 1) 


ai) Heo, 1.45 5.84 


51.6 6.8 2,5) 78 


6.3 Qe 3.49 9.68 


50.7 


19) 15.8 mS) 


*The original Feg3.sNize.5 Invar (A6) is a reference alloy and is not listed here. 
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and the Fe-Ni-Co ternary alloys can be ob- 
served. FeNiCoCrCu forms a single island, in- 
dicating that features of compositions with 
nonzero Cu content are indeed captured by 
HEA-GAD. The new FeCoNiCr HEAs candidates 


are cross-marked, whereas the best-ranked HEAs 
are illustrated with white dots in Fig. 2, A and 
B. We also show the last iteration result of 
FeCoNiCrCu HEAs discovered by HEA-GAD- 
TERM in Fig. 2, C and D (in red). The entire 


latent space is slightly rotated because of the 
addition of new data into the training data- 
set from previous iterations. The augmented 
dataset also leads to a modified GMM-modeled 
probability density shown in Fig. 2D, in which 


A ‘Ideal’ B ‘Real-world’ E FeCoNiCr HEAs , _ FeCoNiCrCu HEAs 
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Fig. 4. Analysis of the results after six iterations in the active learning 
loop. (A and B) Representation of the alloy discovery process in the ideal 
scenario and the real world. (C and D) Cr and Cu distribution histogram. 
The Cr histogram has various concentrations (from 0 to 20%). By contrast, 
the vast majority (>95%) of the compositions have zero Cu concentration. The 
lowest known TEC as a change of composition is plotted as a solid line, and the 
unknowns are shown as a dashed line. Gray arrows illustrate the discovery paths 
of HEA-GAD-TERM. (E) Experimental and predicted TEC of the FeNiCoCr and 
FeNiCoCrCu HEAs. (F) MAPE of active learning. Dots represent the MAPE 
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Temperature (K) 


between experiment and predictions. Rapid decrease of the MAPE is akin to a 
natural learning process. (G and I) Electron backscatter diffraction (EBSD) 
phase and boundary maps of the A2 alloy. (H and J) EBSD phase and 
boundary maps of the A3 alloy. (K and L) Change of lattice constants with 


temperature in the A2 [(Fe]_,Fes)so1(Nil_,Ni,)16.7(Cof_,Co})o6.1(Cry_,Cr})71] 
and A3 [(Fe]_,Fet)a27(Ni}_,Nit)oa(Coj_,Co')39.5(Crq_,Cr1)s.7] alloys 

for different values of n, where n denotes the pseudo-alloy concentration 

(0 < ys 0.50). 
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the left Gaussian ellipse extends more to the left 
region compared with Fig. 2B. Such pheno- 
mena suggest that the HEA-GAD-TERM frame- 
work is interpretable and sensitive to the dataset. 


Physics-descriptor—informed model 


So far, the Masumoto empirical rule (34, 35) 
has played an important role in the discov- 
ery of several Invar alloys. As exemplified in 
Fig. 3D for the Feg.Ni3;Co; (wt %) Super Invar 
alloy, according to this rule, the TEC is related 
to the ratio w,/T, (magnetostriction/Curie tem- 
perature): Because of the Invar effect, Invar 
alloys have lower TEC in the ferromagnetic 
state (below Curie temperature Q) than in the 
paramagnetic state (above Q). The TEC in the 
ferromagnetic state can thus be estimated as 


QS_QA-SA_ QA SA 
RS ao T. Te 


Ws 


TEC = 
i 


=tand 


We demonstrated the correlation between w,/ 
T. and the experimental TEC with DFT and 
CALPHAD for FeCoNi alloys. The alloys from 
our experimental dataset were slowly cooled 
from high-temperature homogenization, so 
an equilibrium temperature to calculate the 
phase fractions in our samples cannot be de- 
termined unambiguously. We nevertheless cal- 
culated o,/7, for the annealing temperatures 
Tann = 1278 K, 1073 K, and 873 K (Fig. 3, A to C) 
and observed a good correlation with the ex- 
perimentally observed TEC values, especially 
for the values taken at Tiny = 873 K. w, and T, 
are thus useful descriptors that can be exploited 
to increase the accuracy of TERM. We show 
the comparison of the model training history 
with and without the use of the descriptor 7, 
(Fig. 3E). This history reflects the performance 


evolution with time (epoch) as more data were 
fed to the model. The final testing error was 
notably reduced from 0.19 to 0.14 upon in- 
clusion of DFT and CALPHAD data, a piece of 
strong evidence that the physics-descriptor- 
informed model can achieve better accuracy 
than that based only on compositions. 


Learning curve and thermal expansion behavior 


We show the measured and predicted TEC 
values of the 17 alloys experimentally measured 
in the six iterations in Table 1. A3 and A9 HEAs 
with four principal elements show extremely 
low TECs that are comparable to the classical 
Feg3.5Nig6.5 (wt %) binary Invar alloy. B2 and 
B4 HEAs with five principal elements show 
TECs that are comparable to the commercially 
used Fes4Coy7Niog (wt %) ternary Kovar alloys. 
In addition, a tabular comparison between 
HEA-GAD-TERM and trial and error can be 
found in table S2, where our method shows a 
fivefold higher discovery rate than that achieved 
by the trial and error approach alone. 

We illustrate the alloy discovery process in 
two scenarios (Fig. 4, A and B). In the ideal 
case, the composition-TEC curve is simple and 
convex, which means that this specific relation 
is readily learned and “never forgotten.” Even 
with a small dataset present, the global maxima 
can be easily found regardless of their initial 
starting points: Both path 1 and path 2 can lead 
to the Invar point. However, in the reality, the 
lowest TEC curve is highly nonlinear because 
of the complex underlying composition-property 
relations, and the composition landscape re- 
mains largely unknown. Both experts with ap- 
propriate domain knowledge and algorithms 


will have to explore the unknown territory and 


accumulate knowledge about the system by 
making mistakes. Furthermore, the composi- 
tion axis is multidimensional and therefore the 
design space is huge. Therefore, the chosen 
paths, available data, and starting points will 
notably influence the final results. Path 1 may 
lead to local minima, whereas path 2 is rather 
difficult initially, and multiple high TEC non- 
Invar HEAs can be discovered before the even- 
tual Invar discovery. 

We provide the concentration histogram of 
Cr and Cu in the current dataset in Fig. 4, C 
and D. We also plotted the observed lowest 
TEC curve to illustrate the discovery path in 
two HEAs. The GAD-TERM framework shows 
its high efficiency by quickly identifying the 
Invar points in the first iteration (A3 and B2). 
However, the algorithm is designed for ex- 
ploration. The algorithm inevitably discov- 
ers some non-Invar alloys along the path (e.g., 
A4 and A8, denoted by gray arrows in Fig. 4, C 
and D). As mentioned before, the discovery of 
FeNiCoCr HEAs and FeNiCoCrCu HEAs are 
different tasks because of the different data 
distribution. The distribution of Cu in the 
alloys is extremely imbalanced (Fig. 4D); that 
is, by far most of the alloys in the dataset do 
not contain Cu at all and only a few alloys 
have 5% Cu. Such distributional difference 
likely accounts for the substantially different 
learning behavior observed (Fig. 4, E and F). 

We show the measured and predicted TEC 
values for FeNiCoCr and FeNiCoCrCu HEAs in 
Fig. 4E and the mean absolute percentage er- 
ror (MAPE) between experiments and predic- 
tions versus experimental iteration in Fig. 4F, 
with each exploitation and exploration step 
marked by arrows. For FeNiCoCr HEAs, the 
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Fig. 5. Summary of the properties of the ML-designed HEAs. (A) TEC of the 
ML-designed HEAs as a function of the change in temperature. As a comparison, 
we plotted the thermal expansion curve of the HEAs and MEAs. A3 and A9 
FeNiCoCr HEAs show extremely low TECs around 2 x 10-°/K at 300 K, which can 
be used as Invar alloys. B2 and B4 FeNiCoCrCu HEAs show low TECs around 
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5 x 10°°/K at 300 K, which qualifies them as Kovar alloys. (B) Configurational 
entropy plotted against the TEC values for various known alloys and alloys 
discovered in this work. ML enables this approach to efficiently discover new 
alloys with excellent properties (high resistance to thermal cycles) in an infinite 
phase spectrum (compositionally complex alloys). 
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average experimental TEC value gradually de- 
creases: 6.49 x 10°° per degree kelvin (/K) in 
the first, 5.61 x 10°°/K in the second, and 
3.65 x 10°°/K in the third iteration (Table 1). 
Exploration and exploitation take place alter- 
nately, akin to a natural learning process, and 
such a plot represents the “learning curve” of 
the HEA-GAD-TERM model. The learning curve 
indicates a progressive trend as the MAPE 
error decreases notably (from 1.5 to 0.2). 
Because of the exploration step, the model 
predictions deviate considerably from their 
experimental counterparts in the first itera- 
tion. Alloy A3 (Table 1) has the highest pre- 
dicted TEC value (4.39 + 0.79 x 10°°/K), but 
the experimental TEC value shows exactly the 
opposite, namely, the lowest measured TEC 
value (1.41 x 10~/K). In the second and third 
iterations, the standard deviation of the ex- 
perimental TEC values declines substantially 
(3.34 x 10-§/K and 1.46 x 10~°/K, respectively). 
This demonstrates excellent exploration prog- 
ress in which HEA-GAD-TERM converges 
quickly and can predict TEC with high ac- 
curacy after only three iterations. Conversely, 
FeNiCoCrCu shows a different learning behav- 
ior. The discovery path shows no significant 
improvements, from experimentally mea- 
sured 6.26 x 10 °/K in the first iteration, to 
6.64 x 10~°/K in the second, and 5.67 x 10°°/K 
in the third (for more numerical details, see 
Table 1). We can attribute this trend to the lack 
of Cu-containing FeNiCoCrCu data (only three 
data points are available at the beginning; Fig. 
4D). Despite this shortcoming, the experimen- 
tal mean deviation narrows down, from 33.9% 
for the first iteration to 10.2% in the last ite- 
ration, indicating a gradually improved model 
accuracy. 

To reveal the physical origin behind the 
properties, we show experimental and DFT 
analyses of the A2 and A3 alloys (TEC = 10.52 x 
10-°/Kand 1.41 x 10°°/K, respectively, in Fig. 4, 
G to L). It can be seen in Fig. 4, G to J, that A2 
and A3 alloys have a single-phase bec and fcc 
structure, respectively. The partial disordered 
local moment (PDLM) model within the co- 
herent potential approximation simulations 
(40) reveals that the Invar effect is qualitatively 
related to such volume reduction at finite- 
temperature PDLM phase compared with the 
0 K ferromagnetic ground state (42). In con- 
trast to the fcc A3 alloy, the bec A2 alloy, with a 
higher 7, around 950 K, exhibits a slight up- 
ward trend of the lattice parameter a. Using 
DFT simulations, we also validate that if the A2 
alloy can be stabilized in its fec phase state, then 
an Invar effect can be realized as well [Fig. 4, K 
and L, red dash-dot line; for simulation de- 
tails, see (31)]. The TEC value is also affected 
by the occurrence of phase transformations in 
some HEAs (J8, 20). Our measurements show 
that the low TEC values of our A3 alloys are 
not caused by any phase transformation. 
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We show the TEC as a function of temper- 
ature for the two Invar alloys (TEC =2 x 107°/ 
K) and two Kovar alloys (TEC ~5 x 10°°/K) 
that we developed in Fig. 5A, compared with 
HEAs and medium-entropy alloys (MEAs) 
(19, 42). The new alloys show abnormally 
low TEC values compared with the HEAs, 
MEAs, and conventional alloys previously 
reported (Fig. 5B) (43-45). Most Invar alloys 
show a low TEC but also low configurational 
entropy. The Invar alloys developed in this 
work offer a good combination of low TEC 
and high configurational entropy. This indi- 
cates the high potential of the HEA concept 
for the design of Invar alloys, which, beyond 
their beneficial thermal expansion response, 
also offer high strength, ductility, and cor- 
rosion resistance. 


Conclusions 


Understanding the underlying physics behind 
composition-property relations is the key mis- 
sion in alloy design, a task particularly chal- 
lenging in the case of compositionally complex 
materials. In principle, HEAs with interesting 
features can hide in practically infinite and 
vastly unexplored composition space, a sce- 
nario that puts targeted alloy design to its 
hardest test. We have therefore developed a 
widely applicable active learning framework 
that combines a generative model, regression 
ensemble, physics-driven learning, and experi- 
ments for the compositional design of HEAs. 
Our method demonstrates its proficiency in 
designing high-entropy Invar alloys using 
very sparse experimental data. The entire 
workflow required only a few months, in con- 
trast to the conventional alloy design approach, 
which requires years and many more experi- 
ments. We expect that more than one prop- 
erty can be optimized simultaneously using 
the GAD-TERM framework in the composi- 
tional spectrum of HEAs. 
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COMMUNITY ECOLOGY 


Emergent phases of ecological diversity and 
dynamics mapped in microcosms 


Jiliang Hut, Daniel R. Amor}, Matthieu Barbier**, Guy Bunin®, Jeff Gore’* 


From tropical forests to gut microbiomes, ecological communities host notably high numbers of 
coexisting species. Beyond high biodiversity, communities exhibit a range of complex dynamics that 
are difficult to explain under a unified framework. Using bacterial microcosms, we performed a 

direct test of theory predicting that simple community-level features dictate emergent behaviors of 
communities. As either the number of species or the strength of interactions increases, we show that 
microbial ecosystems transition between three distinct dynamical phases, from a stable equilibrium in 
which all species coexist to partial coexistence to emergence of persistent fluctuations in species 
abundances, in the order predicted by theory. Under fixed conditions, high biodiversity and fluctuations 
reinforce each other. Our results demonstrate predictable emergent patterns of diversity and 


dynamics in ecological communities. 


n nature, species reside and interact with 

myriad other species in complex commu- 

nities (7). Central challenges in ecology in- 

clude understanding how many species are 

able to coexist, why biodiversity is higher 
in some places than others, why communities 
show varying dynamical behaviors (2, 3), and 
how these factors shape ecosystem functioning 
(4). A long-standing debate concerns whether 
the diversity of a community enhances or weak- 
ens its stability (5). By studying natural com- 
munities, ecologists have identified potential 
environmental drivers that could affect both 
biodiversity and community dynamics (6). Lab- 
oratory experiments facilitate disentangling such 
environmental drivers from inherent community 
properties, such as species interactions, that 
may also shape biodiversity and dynamics. 
Experimental communities with few species 
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have been shown to display predictable 
dynamics, such as stable equilibria and pe- 
riodic oscillations (7-12), and have allowed 
an understanding of the role of interactions 
ranging from predation (9-11) to competition 
(7, 8) to cross-feeding (72). In more biodiverse 
laboratory microcosms derived from natural 
habitats, however, community composition is 
only reproducible and predictable at family 
or higher levels of taxonomy (3, 13-15). Given 
the relative inaccessibility of detailed infor- 
mation on the ecological roles of every spe- 
cies (capturing every interaction strength, 
growth rate, and carrying capacity, among 
others), the question arises: Is it possible to 
predict the biodiversity and dynamics of these 
complex communities with simple community- 
level parameters? 

Starting with the pioneering work by 
Robert May (J6), ecologists have sought to pre- 
dict community behaviors using community- 
level parameters such as the number of species 
and the distribution of interaction strengths 
between species. The interaction strengths 
quantify how strongly a species influences 
the growth and survival of other organisms in 
the community and therefore determines the 
overall composition and stability of commu- 
nities (14). May and others have suggested that 
a large number of species and strong interac- 


tions lead to instability of community dynam- 
ics (16-20), yet we still do not understand how 
communities behave beyond the transition 
to instability. Recent theory suggests that a 
fraction of species tend to go extinct before 
the community loses stability (27-23) and 
that unstable communities can exhibit fluc- 
tuations, which could in turn reinforce bio- 
diversity (24-30). This body of theory has 
been difficult to validate because the asso- 
ciated parameters are hard to estimate and 
manipulate (37). Experimental microcosms 
have now reached the necessary controlla- 
bility (8, 14, 15) to test theoretical predictions 
based on community-level parameters of eco- 
logical communities. We aim to uncover the 
relationship between stability and diversity 
through experimentally controlling two fac- 
tors that are usually unobservable in natural 
settings: the strength of interspecies inter- 
actions and the number of species introduced 
in the experiment (referred to as the species 
pool size). 

We began by summarizing the predictions 
on community dynamics and biodiversity from 
the well-known generalized Lotka-Volterra 
model, modified to include dispersal from a 
species pool: 


dN; s 
=e =N; (: 2 as) +D (1) 


where N; is the abundance of species 7 (nor- 
malized to its carrying capacity), 0; is the in- 
teraction strength that captures how strongly 
species j inhibits species 7 (with self-regulation 
a, = 1), and D is the dispersal rate. We sim- 
ulated the dynamics of communities with dif- 
ferent species pool sizes S and interaction 
matrices. We sampled the interaction strength 
from a uniform distribution U[0, 2<o,;>], where 
<oy> is the mean interaction strength between 
species (which also determines the variance of 
interactions; the values of <a,> and std(a,) 
increase proportionally in this study, where 
std(a,) is the standard deviation of interac- 
tions; see (32)]. Modeling species interactions 
as a random interaction network captures species 
heterogeneity without assuming any particular 
community structure (16, 17, 23). 

Our simulations revealed a strong depen- 
dence of biodiversity (number of coexisting 
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Fig. 1. Theory predicts that species pool size 
and interspecies interaction strength shape 
phases of community diversity and dynamics. 
(A) Representative time series of species abundance 
for the qualitatively different dynamics of com- 
munities with different species pool size S, under 
interaction strength <oj> = 0.3. Communities 
transition from stable full coexistence (S = 4) to 
stable partial coexistence (S = 20) to persistent 
fluctuations (S = 80). (B) Increasing interaction 
strength while fixing the species pool size reveals 
analogous transitions. The values of <a> and 
std(aj) increase proportionally in this study. 

(C and D) Mean fractions of species that survive in 
the community (C) and communities that exhibit 
persistent fluctuations (D). As interaction strength 
increases, communities lose species (dashed vertical 
line, transition from phase | to phase Il) before 
losing stability (solid vertical line, transition from 
phase II to phase Ill). (E and F) Mapping the 
survival fraction (E) and community fluctuation 
fraction (F) onto the phase space reveals that this 
sequence (phase | to phase I! to phase III) of phase 
transitions is maintained as either of the control 
parameters increases. The gray dashed (solid) line 
shows the analytical solution for the survival 
(stability) boundary. The color maps depict the 
mean value over 1000 simulations (32). 


species) and dynamics on both the species 
pool size S (Fig. 1A) and interaction strength 
<a,> (Fig. 1B). As either of these parameters 
increase, communities experience a transi- 
tion from stable full coexistence (phase I: all 
species survive and reach stable abundances) 
to stable partial coexistence (phase II: some 
species go extinct, and the surviving ones reach 
stability) to persistent fluctuations in species 
abundances and biomass (phase III) [figs. S1 
to S3 and (32)]. The transition to unstable dy- 
namics (phase II to phase III) corresponds 
with the loss of linear stability of the equilib- 
rium, consistent with May’s theory (fig. S4). 
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These results agree with recent theory that 
derived analytically the existence of a phase 
transition from a distinct stable state (phases I 
and II) to persistent fluctuations (phase ITI) 
(21, 22). 

To address the ecological implications of 
these dynamical phases, we analyzed both the 
fraction of species that survive at equilibrium 
(Fig. 1, C and E, and fig. S5) and the fraction of 
communities that exhibit persistent fluctua- 
tions (Fig. 1, D and F). We found that the se- 
quence of dynamical phases is generic across 
the parameter space: Communities generally 
experience species extinctions before they lose 


0 
1 20 40 60 80 100 
Size of species pool, S 


stability as either of the control parameters 
increase. This sequence is both predicted by 
analytical expressions for the phase bounda- 
ries (Fig. 1, C to F) and robust to different 
choices of interaction strength distributions 
and modeling assumptions (figs. S6 and S7) 
(21). In particular, natural ecological com- 
munities display diverse interaction types, 
which affects the degree of symmetry in 
the interaction matrix (0,,) (e.g., competition 
and mutualism may be symmetrical, whereas 
predation is antisymmetrical). We found 
that varying these properties of the interac- 
tion matrix does not qualitatively affect the 
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Fig. 2. Increasing species pool size or interaction strength leads to loss of 
stability in microbial communities. (A) We used a library of 48 bacteria to 
generate species pools of different sizes and compositions. Cocultures underwent 
serial dilutions with additional dispersal from the pool. Community composition 
and total biomass were monitored through 16S sequencing and optical density 
(OD). (B) In two-species cocultures, interaction strengths leading to the loss of 
coexistence (a > 1) increase in frequency with nutrient concentration. Error bars 
represent SEM; n = 30. (C) Fluctuations in community biomass increase with 
either species pool size or interaction strength. Solid lines represent eight 


SCIENCE science.org 


0 S<“(m> 0 Sa!» 0 5 aay 
Time (days) 


1 | ] 
2 
3 | 
[ a| —— 
0 0.5 1 0 0.5 1 
Relative abundance 


\ on day 10 , 


different species pool compositions (dashed lines represent replicates of the 
48-species community). Purple (orange) lines highlight stable (fluctuating) 
dynamics. (D) Under high nutrient concentration, half of the 12-species communities 
exhibit persistent fluctuations (top panels) in species abundances and the rest 
reached stability (bottom panels). (E) Time series (top panels) for the species 
abundances in 48-species communities. Stability was reached only under low 
nutrient concentration, and variability in end-point relative abundances increased 
with nutrient concentration (bottom panels) (fig. S15). Relative abundance 
plots show the amplicon sequence variant data of individual replicates. 
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Fig. 3. Species pool size and interaction strength determine the diversity and dynamics of experi- 
mental communities. (A) Fraction of surviving species decreases with either species pool size or interaction 
strength (nutrient concentration). The survival fraction decreases more slowly at high S and strong 
interaction strength. (B) Fraction of fluctuating communities increases with either species pool size or 
interaction strength. (C) Phase diagram for the fraction of species surviving in experimental communities. 
As communities cross the boundary of phase | (dashed white line), they experience species extinctions, 
with a fast decay in survival fraction through phase I! and a relative maintenance of survival fraction through 
phase III. The solid white line indicates the stability boundary. (D) Phase diagram for the fraction of 
fluctuating communities in experiments. Communities start exhibiting persistent fluctuations after crossing 
the boundary into phase Ill (solid gray line). The dashed gray line indicates the survival boundary. 

In (A) and (B), error bars represent SEM; n = 8. 
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Fig. 4. Fluctuating communities are more diverse than stable communities under the same conditions. 
(A) As the average survival fraction (blue line) decreases with increasing species pool size S in simulations, 
more communities exhibit fluctuations in species abundances (orange data points). Whereas stable 
communities (purple data points) exhibit a steady decrease in species survival fraction with S, the loss of 
species is slower in fluctuating communities. Each data point represents an individual community. (B) In 
experiments under high nutrient concentration (also under lower nutrient concentration; fig. S28), fluctuating 
communities exhibit a higher survival fraction than stable communities. The survival fractions of 88% 
(+5%) of the fluctuating communities are above or equal to the mean, as compared with 14% (+6%) in 
the case of stable communities [p < 0.01; (32)]. Error bars represent SEM; n = 8. 
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dynamical phases (figs. S8 and S9). Other 
model choices—for example, considering pH- 
mediated interactions or the serial dilution 
of communities into fresh media (figs. S9 
to S12) (14)—further showed the robustness 
and generic nature of the dynamical phases. 
Therefore, it may be possible to predict the 
diversity and dynamics of ecological commu- 
nities from community-level features of the 
interaction network. 

To experimentally test the predicted phase 
transitions, we built synthetic communities 
using a library of 48 bacterial isolates from 
terrestrial environments [figs. $13 and S14 
and (32)]. After inoculation, we exposed com- 
munities to cycles of growth, dispersal from 
the pool, and dilution while monitoring com- 
munity composition and biomass at the end 
of each daily cycle [Fig. 2A and (32)]. Leverag- 
ing previous work (/4, 33), we tested media 
conditions to tune the strength of bacterial 
interactions. We found that the probability of 
coexistence in pairwise coculture decreased 
with the concentration of supplemented glu- 
cose and urea. In this medium, an increase 
in the concentration of these nutrients there- 
fore increases the strength of competitive 
interactions (Fig. 2B and tables S1 to S3). 
As discussed in our previous work (14, 33), 
high nutrient concentrations lead to extensive 
modification of the media (e.g., pH) and hence 
stronger interactions. This experimental plat- 
form allows us to control the key parameters 
established by theory: species pool size and 
interaction strength. 

We experimentally mapped the phase space 
of community dynamics by exposing 63 spe- 
cies pools to three levels of interaction strength. 
Specifically, we tested 30 species pairs (S = 2); 
eight different communities for each size S = 3, 
6, 12, and 24; and one community of S = 48 
(the full species library). The resulting bio- 
mass time series were relatively stable under 
low interaction strength and small species 
pool size, whereas increasing these two var- 
iables progressively led to a higher fraction 
of communities exhibiting biomass fluctua- 
tions (Fig. 2C). Analyzing species abundances 
through 16S sequencing (Fig. 2, D and E), we 
found that biomass fluctuations were highly 
correlated with species abundance fluctua- 
tions (figs. S15 and S16). For example, for com- 
munities with 12 species in the pool and high 
nutrient concentration, four communities 
reached stable equilibria and the remaining 
four exhibited fluctuations in both biomass 
and species abundances until the end of the 
experiment (Fig. 2, C and D). Replicates with 
identical species pool composition exhibited 
highly reproducible dynamics (figs. S17 to S25), 
and the classification of stable and fluctuating 
communities was robust to different methods 
that analyzed biomass, species composition, 
and variations between replicates [figs. S15 
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and S16 and (32)]. We also experimentally ob- 
served this transition toward unstable dynamics 
under different carbon sources and dilution 
frequencies (fig. S27). Therefore, synthetic mi- 
crobial communities lose stability as either spe- 
cies pool size (for S > 2) or interaction strength 
increases. 

To understand the relationship between spe- 
cies extinctions and loss of community stability, 
we analyzed species survival across these ex- 
periments. As expected, the fraction of sur- 
viving species decreased with an increase in 
either species pool size or interaction strength, 
as determined by nutrient concentration (Fig. 3A). 
For example, at medium interaction strength, 
83% (+3%) of species were able to survive in 
the 30 pairwise (S = 2) cocultures, whereas this 
frequency decreased to 36% (+7%) among the 
eight different combinations of six-species com- 
munities (S = 6; Fig. 3A). Despite the pronounced 
loss of species, none of these communities 
displayed persistent fluctuations (Fig. 3B). 
Such fluctuations arose with further increase 
of the species pool size, with half of the 24- 
species combinations displaying fluctuations 
(Fig. 3B). Notably, the species survival fraction 
displayed only a modest decrease entering the 
fluctuation regime, with 24% (+2%) of species 
surviving in the 24-species communities as 
compared with 36% (+7%) in the six-species 
communities (Fig. 3, A and B). Mapping these 
experimental results over the phase space (Fig. 
3, C and D) confirmed the theoretically pre- 
dicted (Fig. 1, E and F) sequence of transitions: 
Communities experience species extinctions 
before exhibiting persistent fluctuations, as 
either species pool size or interaction strength 
increases. 

Next, through analyzing species survival 
fraction across different species pool composi- 
tions, we addressed how fluctuations and diver- 
sity may influence each other. In simulations, 
the fraction of surviving species revealed a 
generic trend: For the same species pool size 
and interaction strength, fluctuating commu- 
nities were more diverse than stable commu- 
nities (Fig. 4A). This trend was also observed 
in experiments: Most fluctuating communities 
reached higher survival fractions than stable 
communities reached under the same con- 
ditions (Fig. 4B and fig. S28). For example, 
within the 12-species communities, fluctuating 
communities had on average 5 + 1 species 
surviving, as compared with only 2 + 1 species 
surviving in stable communities. Among the 
fluctuating communities, 88% (+5%) exhibited 
survival fractions above or equal to the mean, 
as compared with only 14% (+6%) among the 
stable communities [p < 0.01; (32)]. Both ex- 
periments and simulations suggest that fluc- 
tuations are an emergent, diversity-dependent 
phenomenon, because the addition of spe- 
cies pools from stable communities often 
yielded larger, fluctuating communities (fig. 
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$29). We also found numerically that fluctua- 
tions and high diversity disappeared together 
as we stopped dispersal or pinned the abun- 
dance of the most abundant species (fig. S1). 
Our results show that diversity and per- 
sistent fluctuations enhance each other, as 
theoretically demonstrated in previous work 
(25, 26). 

Our findings are consistent with two major 
ideas in theoretical ecology: May’s suggestion 
that complexity leads to instability (16) and 
Chesson’s argument that temporal fluctua- 
tions can help maintain diversity (34). The 
question of whether complex dynamics are in- 
herent to the ecological community—arising 
from species interactions—or driven by envi- 
ronmental factors has received considerable 
attention yet has seldom undergone a direct 
experimental test in many-species communities. 
Under laboratory conditions that minimize 
environmental stochasticity, and in agreement 
with recent theory (21, 23, 35), we found that 
community-level parameters representing 
species diversity and interactions are suf- 
ficient to predict the dynamical behaviors 
of complex ecological communities. These 
predictions are theoretically robust to vary- 
ing biological assumptions [e.g., intraspecific 
diversity and interspecies interaction mecha- 
nisms, including resource-explicit models (36)]. 
Therefore, the emergent phases of biodiversity 
and dynamics that we observed in this study 
may occur in a wide range of ecological com- 
munities. Future work should study whether 
these phases generalize across spatiotemporal 
scales, environmental conditions, and orga- 
nism types to understand their prevalence 
and importance in shaping major ecological 
patterns (37, 38). 
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MONKEYPOX 


Heavy-tailed sexual contact networks and monkeypox 
epidemiology in the global outbreak, 2022 


Akira Endo’?3*, Hiroaki Murayama‘, Sam Abbott, Ruwan Ratnayake?”, Carl A. B. Pearson*2°, 
W. John Edmunds?”, Elizabeth Fearon”°+, Sebastian Funk?2+ 


The outbreak of monkeypox across non-endemic regions confirmed in May 2022 shows epidemiological 
features distinct from previously imported outbreaks, most notably its observed growth and predominance 
amongst men who have sex with men (MSM). We use a transmission model fitted to empirical sexual 
partnership data to show that the heavy-tailed sexual partnership distribution, in which a handful of 
individuals have disproportionately many partners, can explain the sustained growth of monkeypox among 
MSM despite the absence of such patterns previously. We suggest that the basic reproduction number (Ro) 
for monkeypox over the MSM sexual network may be substantially above 1, which poses challenges to 
outbreak containment. Ensuring support and tailored messaging to facilitate prevention and early detection 
among MSM with high numbers of partners is warranted. 


n May 2022 multiple countries in Europe, 

North America, and elsewhere reported 

clusters of monkeypox cases (J-4). As of 

31 May 2022 (time of analysis) a total of 

728 confirmed and suspected cases have 
been reported in more than 25 countries from 
previously non-endemic regions (5). The global 
case count has substantially grown since and 
exceeded 30,000 from over 80 countries as of 
10 August 2022 (6). To date, the reported cases 
are predominantly, but not exclusively, among 
young males without a travel history to en- 
demic regions in Central and West Africa (2, 3). 
Initial epidemiological investigations suggested 
a link with sexual contact among men who 
have sex with men (MSM) (J-3, 7). Prior to the 
current outbreak, monkeypox infections had 
been assumed to be primarily caused by expo- 
sure to animal reservoirs but human-to-human 
transmissions through direct routes including 
skin-to-skin contact, bodily fluids, and respira- 
tory droplets have also been documented (8, 9). 
Sexually associated exposure to skin lesions, 
droplets, and fomites could plausibly be a risk 
for transmission regardless of whether mon- 
keypox is truly sexually transmissible (e.g., 
through semen). Previous studies of monkey- 
pox outbreaks indicate a secondary attack risk 
(SAR) of ~10% among household members 
without smallpox vaccinations (8-10); the small- 
pox vaccine has been shown to be protective 
against monkeypox with an estimated effec- 
tiveness of 85% (10). Investigations of previous 
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outbreaks in Central and West Africa identi- 
fied a relatively limited proportion of cases of 
human-to-human transmission, with at most 
seven generations observed (8, 9, 11, 12), and 
previous estimates of the basic reproduction 
number (Ro) for monkeypox have been be- 
low 1 even in unvaccinated populations (9, 10). 
Sporadic monkeypox outbreaks associated 
with imported animals or imported cases 
have been observed in non-endemic regions 
(13-17) but subsequent human-to-human spread 
was rarely observed. Prior to the current out- 
break, only one health care worker and two 
household contacts of an imported case had 
been identified as likely secondary cases in 
non-endemic settings (15, 17). 

The current spread of monkeypox in non- 
endemic regions appears in stark contrast to 
these previous events. Most cases have no docu- 
mented exposure to animals or travel history to 
endemic settings. The rapid growth in notified 
cases and geographical dispersal suggest sub- 
stantial human-to-human transmission, rather 
than incidence driven by spillover from an ani- 
mal reservoir. This is also the first widespread 
outbreak of monkeypox predominantly in MSM 
with suggested sexually associated transmission 
CZ, 18), although higher prevalence in young 
males and frequent observation of genital le- 
sions have also been documented in a recent 
outbreak in Nigeria (12, 19, 20). Proposed ex- 
planations for the novel character of the cur- 
rent outbreak include increased importation, 
undetected community-wide transmission, viral 
evolution, and increased susceptibility due to the 
end of smallpox vaccination globally (7, 17, 18, 21). 
Although these theories are consistent with 
some aspects of the current observation, most 
are not strongly supported by external (if in- 
direct) evidence nor do they provide a coher- 
ent explanation on why a similar monkeypox 
outbreak involving substantial human-to-human 
transmission in a focal, rather than generalized, 
population had not arisen from the series of 


importation events documented in non-endemic 
settings starting in 2003 (13-17). 

We show that transmission over a sexual 
contact network empirically characterized by 
a heavy-tailed partnership distribution can rea- 
sonably explain the rapid growth of human-to- 
human transmission in the current monkeypox 
outbreak despite the absence of such patterns 
of spread in the past. Specifically, it is plausible 
that monkeypox has had a substantial trans- 
mission potential in the MSM sexual contact 
network but because of the small cumulative 
number of imported cases in non-endemic set- 
tings, it had not reached members of this net- 
work with high numbers of contacts from whom 
onward transmission was most probable. The 
main analysis of this study was conducted using 
only information available as of 31 May 2022, a 
few weeks after the outbreak had been first re- 
cognized, and the original version was submitted 
on 12 June 2022 [available from (22)] to provide 
key insights from the earliest data available. 
We retain this original context of the analysis 
in this paper to highlight that the findings were 
obtainable in the earliest phase of the outbreak 
and discuss them in retrospect given the up- 
dated situation since the time of analysis. 

Previous work on sexual partnership distri- 
butions (i.e., degree distribution of sexual con- 
tact networks) often fitted Pareto distributions 
to the reported number of partners over a spe- 
cific time window (e.g., over a year) (23, 24). 
However, Pareto distributions can be scale-free, 
which causes the modeled networks to have 
some individuals with impossibly high num- 
bers of partners and as a result Ro defined in the 
networks tends to infinity (25). We found that 
Pareto distributions do not describe existing 
datasets of MSM partnerships well (see supple- 
mentary materials). The Weibull distribution 
is an alternative distribution that also has a 
heavy-tailed shape (20) and does not exhibit the 
unphysical features of the Pareto distribution. 
Using Weibull distributions fitted to the empir- 
ical data on same-sex and opposite-sex sexual 
partnerships of the UK population aged 18 to 44 
[from the National Surveys of Sexual Attitudes 
and Lifestyles (Natsal)] (26-28), we constructed 
a branching process model of transmission over 
sexual contact networks. Following (25), we 
assumed that individuals can become infected 
at a probability proportional to their network 
degree (i.e., those with a large number of part- 
ners are more likely to be chosen). This assump- 
tion neglects the possible existence of densely 
clustered “core groups” (29); see supplementary 
materials for sensitivity analysis. We assumed 
an infectious period of 21 days for monkeypox 
based on the documented duration of illness 
(30-32), which we also varied in our sensitivity 
analysis to account for variation and possible 
behavior changes in symptomatic individuals. 

With this model, we simulated sexually asso- 
ciated outbreaks of monkeypox in MSM and 
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non-MSM populations, under varying assump- 
tions for the risk of transmission between 
sexual partners (SAR per sexually associated 
contact during the infectious period) between 
0 and 100% in the absence of empirical data 
on this parameter. Starting from a specified 
number of initial cases, we simulated the num- 
ber of cases in each generation of transmission 
over MSM and non-MSM sexual contact net- 
works. For the non-MSM sexual network, we 
assumed that the initial cases have equal 
chances of being male or female and that sub- 
sequent generations of infection alternate be- 
tween heterosexual (HS) men and women. 
Women who have sex with women were not 
considered in our analysis as their partnership 
distribution suggested a substantially lower 
transmission potential than the HS network 
(see supplementary materials). In the model, 
we considered only sexually associated trans- 
mission over separate sexual contact networks 
of MSM and non-MSM and did not explicitly 
model other transmission routes (non-sexually 
associated skin-to-skin contact, respiratory 
droplets, fomites, and so on) or links between 
MSM and non-MSM sexual contact networks, 
except for the initial cases. We discuss trans- 
mission dynamics of monkeypox as a mixture 
of these transmission routes in a separate 
analysis using the next generation matrix (33). 


Fig. 1. Likelihood of A 
observing an outbreak 
given introduction 
events of different 
profiles. (A to C) Likeli- 
hood of observing an 
outbreak of the current size 
(728 cases) or greater over 
the MSM sexual contact 
network given initial cases 
who are (A) MSM with sexu- 
ally associated exposure; (B) 
ISM with non-sexually asso- 
ciated exposure; (C) random 
cases from the general pop- 
ulation with non-sexually 
associated exposure. The 
ikelinood was computed c 
from 100,000 simulations for 
each value of SAR, varied 
between 0 and 100%. 
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The methodology used is described in more 
detail in the supplementary materials. 

We first simulated the probability of observ- 
ing a chain of transmission of a size equal to 
or greater than the current global monkeypox 
outbreak (728 cases as of 31 May 2022) in the 
MSM population generated from a given num- 
ber of initial cases. We considered three scenar- 
ios for the profile of introduction events: (i) The 
initial cases in the MSM population acquired 
infection through a sexually associated route, 
i.e., the numbers of their sexual partners are 
preferentially drawn from the tail of the sexual 
partnership distribution; (ii) the initial cases 
in the MSM population were non-sexually as- 
sociated and therefore their partnership degrees 
were drawn from across the full distribution; 
ii) initial cases were from the general popu- 
lation (of which 2% were assumed to be sex- 
ually active MSM based on the Natsal datasets) 
who acquired infection through non-sexually 
associated routes. We then simulated the prob- 
ability of the current outbreak leading to a 
major outbreak over the MSM sexual contact 
network. For comparison, outbreaks over the 
non-MSM sexual contact network—given spe- 
cified numbers of initial cases (either sexually 
associated or non-sexually associated)—were 
also simulated. Our estimates suggest that with 
a range of sexually associated SAR values com- 
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parable to or greater than the previous esti- 
mates of household SAR (8-10), even one event 
of sexually associated transmission to the MSM 
population (scenario i) is consistent with a high 
likelihood (20 to 50%) of observing an outbreak 
of the current size or greater (Fig. 1A). The like- 
lihood becomes smaller (although not negligi- 
ble) if non-sexually associated exposure (e.g., 
exposure to animals or nonsexual direct con- 
tact with cases; scenario ii) is involved in the 
introduction to the MSM population (Fig. 1B). 
By contrast, 50 to 100 or more non-sexually as- 
sociated initial cases from the general popula- 
tion (scenario iii) would have been necessary 
for the likelihood of an outbreak of the current 
size in the MSM population to be around the 
order of 1 to 20% (Fig. 1C). 

These results suggest that a small number 
of sexually associated transmissions among 
the MSM population were sufficient to cause a 
large outbreak over the MSM sexual network, 
as currently observed, but that the number of 
non-sexually associated imported cases re- 
quired for the virus to achieve the first few 
instances of sexually associated transmission 
among MSM is relatively large. The cumula- 
tive number of documented imported cases in 
non-endemic settings had been up to ~100 be- 
fore May 2022 (13-17); it is therefore unsurpris- 
ing that introduction to the MSM population in 
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non-endemic settings has never been observed 
previously, assuming that the importations 
had been mostly non-sexually associated cases 
from the general population. The current out- 
break in the MSM population may have been 
introduced by an eventual introduction follow- 
ing non-sexually associated importations or 
alternatively by one or more sexually asso- 
ciated importations acquired in the endemic 
setting. In the latter case, a sexually associated 
outbreak among MSM might also be ongoing 
in the endemic settings, which warrants fur- 
ther surveillance. All scenarios projected that 
without interventions or changes to sexual 
behavior, a major outbreak in the MSM pop- 
ulation (defined as =10,000 cases excluding 
initial cases) is highly likely given the current 
outbreak size (Table 1); this projection, based 
on the data as of 31 May 2022, turned out cor- 
rect in retrospect (6). By contrast, sustained 
transmission over the non-MSM sexual con- 
tact network was unlikely in all scenarios con- 
sidered (Table 1), owing to the less heavy tail of 
the corresponding partnership distribution, 
although 10 to 3000 additional cases may be 
observed if a substantial number of infections 
are introduced into the non-MSM sexual con- 
tact network (Fig. 1D). One caveat must be 
noted, however—sustained transmission in 
a local subnetwork among non-MSM that is 
more densely clustered than that modeled 
may still be possible (fig. $4). 

The projected values of Ro were almost al- 
ways above 1 in the MSM sexual network for a 
range of sexually associated SAR, whereas Ro 
for the non-MSM sexual network was found to 
be below 1 unless the SAR was nearly 100% 
(Fig. 2A). The potentially high Ry for the MSM 
sexual network is particularly concerning as it 
may pose challenges to control efforts (Fig. 
2B). Contact tracing and ring vaccination ap- 
proaches, now being conducted extensively in 
many places with cases, may need to identify 
almost all contacts of a case to bring the epi- 
demic under control (which would not be easily 
achievable in practice) (2) as untraced trans- 
mission may well lead to other sustained trans- 


mission chains. Another possible approach 
would be focusing resources on identifying 
acceptable and effective means of prevent- 
ing transmission among those men with the 
highest number of sexual partners, which could 
have a disproportionate effect on transmission 
overall. We modeled the possible effect of such 
interventions by varying the Weibull parame- 
ters for the MSM partnerships such that the 
(effective) numbers of partners at the distribu- 
tion tail are selectively controlled, e.g., through 
reduced contacts or reduced chance of trans- 
mission per contact (see supplementary mate- 
rials for technical details). The level of control 
at the tail is represented by the upper Ist per- 
centile among those with at least one partner 
over 21 days. The Ry may sharply decrease if 
control efforts are effective in reducing trans- 
missions at the tail part of the partnership dis- 
tribution (Fig. 2C). This would also lower the 
required intensity of other (nonfocused) mea- 
sures to achieve outbreak control (Fig. 2D). Figure 
2E shows the tail part of the modeled Weibull 
distributions under focused interventions. These 
distributions are most different in the region 
x = 10, suggesting that focusing on those with 
more than 10 partners over 21 days would be 
of particular importance. 

The analysis presented here only considered 
a single outbreak over either the MSM or non- 
MSM sexual contact network with a given 
number of introductions. However, under- 
standing the disease dynamics as a mixture 
of interacting populations through multiple 
modes of transmission is crucial in projecting 
possible future scenarios, especially given the 
known or suggested non-sexually associated 
routes of transmission including through drop- 
lets, fomites, or aerosols (9, 34). One of the key 
questions is whether the current monkeypox 
outbreak can be sustained in the general com- 
munity through non-sexually associated routes, 
i.e., whether the Rp corresponding to non- 
sexually associated transmission is above or 
below 1. Although we are unable to directly 
answer this question because the presence of 
non-sexually associated epidemiological links 


——————_ ————— SSS Say 
Table 1. Likelihood of an outbreak over MSM or non-MSM sexual contact network given differ- 
ent numbers and profiles of introduction events. SAR, secondary attack risk; MSM, men who 
have sex with men; S-A, Sexually associated; Gen. pop, general population. 
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in the current outbreak is so far largely uncertain 
or unknown, we propose a possible approach 
to inferring the role of such transmission. We 
show in Fig. 2F that, in an exponential growth 
phase, the proportion of cases without a sex- 
ually associated epidemiological link among 
total cases will approximately approach the 
ratio between R,’s over the MSM sexual net- 
work and general non-sexually associated trans- 
mission routes as the outbreak progresses. 
One should be able to conclude that the non- 
sexually associated Ry for monkeypox is sub- 
stantially lower than Rp over the MSM sexual 
network if the proportion of non-sexually as- 
sociated cases remained low in the future (35); 
however, caution is warranted as even in that 
case the general transmission Rp may still be 
above 1 if Rp over the MSM sexual network is as 
high as suggested in some scenarios presented 
in our analysis. As of 10 August 2022, there 
have been sporadic reports of probable non- 
sexually associated cases including 26 known 
pediatric cases aged 0 to 4 (35). However, there 
has been no clear evidence supporting sus- 
tained transmission through non-sexually asso- 
ciated routes. Available data among cases from 
the World Health Organization (98.7% male, 
97.2% self-identified MSM, and 91.5% with 
reported sexual encounters among those who 
provided information) suggests that the cen- 
tral mode of transmission likely remains the 
MSM sexual contact network, although uncer- 
tainty and the possibility of bias remain as a 
result of excessive missing values (35). 

Without needing a novel hypothesis, through 
the use of empirical sexual partnership data our 
results propose a simple but coherent explana- 
tion for a rapidly growing sexually associated 
monkeypox outbreak in non-endemic regions 
linked to the MSM population. We also suggest 
that Ro over the MSM sexual network may be 
substantially higher than previous estimates in 
non-sexually associated contexts, if the sexually 
associated SAR is comparable to or greater than 
the household SAR. 

These findings need to be translated into 
control efforts to inform and protect the MSM 
community. Self-sustained transmission over 
the entire non-MSM sexual network or through 
non-sexually associated routes appears less like- 
ly, although many cases may still be observed if 
the outbreak continues to grow in the subsets 
of sexual contact networks at a higher risk of 
transmission. Control efforts, such as contact 
tracing and ring vaccination, need to achieve 
high effectiveness given the large Ry values we 
have estimated; focused public health messag- 
ing and support for individuals with multiple 
sexual partners would complement these ap- 
proaches to bring the outbreak under control. 

Our conclusions hinge on the assumed pa- 
rameters from previous outbreaks with differ- 
ent transmission routes, including the SAR 
and infectious period of monkeypox, as well 
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as the observed characteristics of the sexual 
partnership distribution in the UK and accom- 
panying assumptions. We modeled the glob- 
al transmission of monkeypox over a single 
connected sexual contact network fitted to the 


Fig. 2. Basic reproduc- A 
tion number (Ro) of 
monkeypox over sexual 
contact networks and 
control. (A) Projected Ro 
over the MSM and non-MSM 
sexual contact networks 
based on the Natsal sexual 
partnership datasets. 

The dotted horizontal lines 
denote the epidemic 
threshold (Ro = 1). 

(B) Relative reduction 

in Ro required to bring the 
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(C) Projected reproduction 
number (R) over the MSM 
sexual contact networks 
with different levels of the 10° 
partnership distribution 
tail. Holding the body 
shape of the distribution 
constant (see supplementary 
materials for details), we 
adjusted the parameter 

of the Weibull distribution 
to reduce the weight of the 
distribution tail, which we 
assume reflects the effect 
of interventions for those 
with the highest numbers of 10° 
partners. The degree of 0 10 20 
reduction in the distribution 
tail was represented by the 
upper 1st percentile of the 
resulting (effective) number 
of 21-day sexual partners 0.02 
(among MSM with at least 
one partner over 21 days). 
Dashed green lines indicate 
the upper Ist percentile 

of the original Weibull 
distribution fitted to the 
Natsal datasets (baseline). 
Colors denote different 0,02 
assumptions for the base- 0.01 
ine SAR, ie., risk of infec- 

tion per contact without : 5 10 15 20 
nterventions. (D) Relative 
eduction in R required for 
control with different levels of the partnership distribution tail. Using the R 
corresponding to the adjusted Weibull distribution as the baseline, additional 
elative reduction required to bring the outbreak under control is shown. 

(E) Modeled 21-day effective sexual partnership distributions among MSM with 
different levels of distribution tail. Histograms represent modified Weibull 
distributions under interventions focusing on those with highest numbers of 
partners (with the upper lst percentile of 15 and 10). The original distribution 
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datasets of the UK population aged 18 to 44. 
Populations not represented by those datasets 
may be more or less vulnerable to the sexually 
associated monkeypox outbreak as a result of 
different partnership patterns and Ro values. 
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This may also explain the limited observation 
of possible sexually associated outbreaks in 
endemic countries previously, although this 
may in part result from insufficient case as- 
certainment. Our sensitivity analysis using only 
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case counts within the UK yielded almost iden- 
tical results (table S3) and the same approach 
could also be applied to other population set- 
tings where sexual partnership data are available. 
Meanwhile, depletion of susceptibles—especially 
those with many partners—may have visible 
effects in finite MSM populations at the coun- 
try level, which can lead to smaller final outbreak 
sizes than those projected by the branching 
process model (fig. S5). Some of the countries 
with early introductions of cases including the 
UK have seen a slowdown in growth of cases 
as of 10 August 2022 (6); depletion of suscep- 
tibles and other factors such as vaccination and 
increased awareness (36) may have contributed 
to these trend changes. We did not consider the 
possibility of degree assortativity or clustering, 
which would lead to more densely clustered 
local subnetworks than those we modeled (37). 
It is plausible that there could be core parts or 
clusters of the non-MSM sexual contact net- 
works over which transmission could be sus- 
tained, which is not captured by modeling 
transmission over the non-MSM partnership 
distribution as a whole. Finally, because of 
the limited sample size of MSM partnerships 
in the Natsal datasets (n = 409), uncertainty 
remains around their Rg values (table S1 and 
fig. S2). Our estimates should be viewed as a 
qualitative projection rather than precise es- 
timates of Ro. Future empirical evidence from 
the current outbreak and estimates of key epi- 
demiological parameters, as well as the ef- 
fectiveness of interventions will inform our 
projections on the current and future epide- 
miology of the monkeypox outbreak. 
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NEURODEGENERATION 


Two FTD-ALS genes converge on the endosomal 
pathway to induce TDP-43 pathology 


and degeneration 


Wei Shao’, Tiffany W. Todd’, Yanwei Wu, Caroline Y. Jones’, Jimei Tong’, Karen Jansen-West’, 
Lillian M. Daughrity’, Jinyoung Park?, Yuka Koike?, Aishe Kurti?, Mei Yue’, Monica Castanedes-Casey’, 
Giulia del Rosso*, Judith A. Dunmore’, Desiree Zanetti Alepuz', Bjorn Oskarsson*, 

Dennis W. Dickson’”, Casey N. Cook", Mercedes Prudencio™, Tania F. Gendron? , 

John D. Fryer™, Yong-Jie Zhang'2*, Leonard Petrucelli>2* 


Frontotemporal dementia and amyotrophic lateral sclerosis (FTD-ALS) are associated with both a repeat 
expansion in the C9orf72 gene and mutations in the TANK-binding kinase 1 (TBK1) gene. We found that TBK1 
is phosphorylated in response to C9orf72 poly(Gly-Ala) [poly(GA)] aggregation and sequestered into inclusions, 
which leads to a loss of TBK1 activity and contributes to neurodegeneration. When we reduced TBK1 activity 
using a TBK1-R228H (Arg”“®—His) mutation in mice, poly(GA)-induced phenotypes were exacerbated. These 
phenotypes included an increase in TAR DNA binding protein 43 (TDP-43) pathology and the accumulation 

of defective endosomes in poly(GA)-positive neurons. Inhibiting the endosomal pathway induced TDP-43 
aggregation, which highlights the importance of this pathway and TBK1 activity in pathogenesis. This interplay 
between C9orf72, TBK1, and TDP-43 connects three different facets of FTD-ALS into one coherent pathway. 


rontotemporal dementia and amyotrophic 
lateral sclerosis (FTD-ALS) are character- 
ized by the formation of protein inclusions, 
notably the aggregation of TAR DNA bind- 
ing protein 43 (TDP-43). Aggregates imply 
impaired protein clearance, and several FTD- 
ALS-associated genes influence proteostasis, 
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*Corresponding author. Email: petrucelli.leonard@mayo.edu (L.P.); 
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Fig. 1. pTBK1 accumulates into DPR aggregates in c9FTD-ALS. (A and B) |m- 
munoblot (A) and quantification (B) of 6-month-old mouse cortical lysates 

(N = 4 mice). GAPDH, glyceraldehyde-3-phosphate dehydrogenase; A.U., 
arbitrary units. (© and D) Images (C) and quantification (D) of pTBK1 in 
6-month-old mouse cortices (N = 9). Scale bar, 20 um. (E) Immunofluorescence in 
6-month-old mouse cortices. Scale bar, 10 um. (F and G) Immunoblot (F) 

and quantification (G) of HEK293T lysates (N = 4 independent experiments). 
GFP monomer: ~27 kDa; GFP-DPRs: high-molecular weight (HMW) smears. 

(H) Immunostaining in the cortex of 3.5-month-old GFP and (GA);90-V5 mice and 


including TANK-binding kinase 1 (TBK7) (J). 
Loss-of-function mutations in TBK7 are risk 
factors for FTD-ALS (2), and studies in SOD1 
mouse models have suggested that they require 
a so-called second hit to become pathogenic 
(3-5). Accordingly, some TBK1 carriers also har- 
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bor a mutation in another disease-associated 
gene (6, 7). Rare patients who coharbor a 7BK1 
mutation and a G,C, repeat expansion in chro- 
mosome 9 open reading frame 72 (C90rf72) 
(8, 9) present with an earlier age of onset and 
amore rapid disease course (7). Thus, TBK1 


DAPI, 4',6-diamidino-2-phen 
*P < 0.05; ***P < 0.001; ****P < 0.0001; ns, nonsignificant. Statistics by 
s in (B) and (D) and one-way analysis of variance (ANOVA) 


ce. Scale bars, 20 wm. (1) Immunostaining in healthy 


and C9-FTD frontal cortices (images labeled #1 and #2; scale bar, 10 um) 
and the hippocampus (images labeled #3; scale bar, 50 um). (J) Immuno- 


C9-FTD frontal cortices. Scale bar, 10 um. 
ylindole. Data are means + standard deviations (SDs). 


iations for the amino acid residues are as follows: 
g. 


could modulate C9o07rf72-associated FTD-ALS 
(c9FTD-ALS). 

We used an adeno-associated virus (AAV)- 
based C901f72GC)49 Mouse model of c9FTD- 
ALS (0) to determine whether the repeat 
expansion influenced TBK1 phosphorylation 
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Fig. 2. Poly(GA) aggregation sequesters TBK1 into inclusions, which 
limits its activity. (A and B) Immunoblot (A) and quantification (B) of 


HEK293T lysates (N = 3 independent experiments). 


(A) indicates nonspecific bands. (C) Immunofluorescence of HEK293T cells. 
Scale bar, 10 um. (D and E) Immunoblot (D) and quantification (E) of 


HEK293T lysate fractions (N = 3). S, soluble; Ins, in 


at S172, a modification associated with its 
kinase-active form. Although total TBK1 lev- 
els were comparable in cortical brain lysates 
from C9o0rf72-(G4C2)149 Mice and C9orf72- 
(G4C2)2 controls, mice expressing the expanded 
repeat showed an increase in the ratio of phos- 
phorylated TBK1 (pTBK1) to total TBK1 (Fig. 1, 
A and B). In the brains of C90rf72-(G4C2)149 
mice, pTBK1 localized to puncta (Fig. 1, C and 
D). We investigated whether these puncta cor- 
responded to dipeptide repeat (DPR) protein 
aggregates. Five DPRs are produced from the 
repeat-associated non-ATG translation of the 
G,4C, repeat; all five aggregate in the brains 
of c9FTD-ALS patients and mice (J0-14). 
Three DPRs colocalized with pTBK1 in C9orf72- 
(G4C2)149 Mice: poly(GA), poly(GR), and poly(PR) 
(Fig. 1E and fig. SIA). These three DPRs are 
toxic in vivo (15-17). 

We next explored which DPR or DPRs 
were responsible for these effects on TBK1 
phosphorylation and localization. In HEK293T 
cells, green fluorescent protein (GFP)-(GA),00 
increased the pTBK1-to-total TBK1 ratio and 
coimmunoprecipitated with TBK1, whereas 
GFP-(GR)99 and GFP-(PR)199 did not (Fig. 1, 
F and G, and fig. S1IB). Accordingly, pTBK1 


formed puncta in the brains of mice express- 
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ing poly(GA),o9-V5 but not in the brains of 
mice expressing GFP-(GR) 9) or GFP alone 
(Fig. 1H). Thus, poly(GA) was responsible for 
triggering the phosphorylation and aggrega- 
tion of TBK1. Additionally, we observed pTBK1 
puncta in the frontal cortex and hippocampus 
of C9-FTD patients but not in healthy controls 
(Fig. 1I and table S1); these puncta colocalized 
with poly(GA) (Fig. 1J and fig. SIC). Puncta were 
also detected by a total TBK1 antibody in C9- 
FTD (fig. SID). 

TBK1 is activated when the oligomerization 
of its interactors triggers its clustering and 
transautophosphorylation (18-20): The oligo- 
merization of poly(GA) could induce TBK1 
phosphorylation by a similar mechanism. In 
cells, the proline-interrupted poly(GA,;P)—which 
is unable to form compact aggregates (15)— 
failed to induce the phosphorylation of TBK1 
(Fig. 2, A and B), did not colocalize with pTBK1 
(Fig. 2C), and did not coimmunoprecipitate 
with TBK1 (Fig. 2F). Moreover, pTBK1 shifted 
to the insoluble fraction of cell lysates when 
poly(GA) was aggregated (Fig. 2, D and E, and 
fig. S2A). The interaction of TBK1 with poly(GA) 
was independent of its phosphorylation because 
coimmunoprecipitation still occurred when the 
S172 locus was mutated (fig. S2B). Thus, TBK1 
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(D) indicates higher exposure. (F) Coimmunoprecipitation analysis of 
HEK293T lysates. IP, immunoprecipitation. (G@ and H) Immunoblot (G) and 
quantification (H) of HEK293T lysates (N = 3). In all blots, GFP monomer: 
~27 kDa; GFP-DPRs: ~50 kDa unaggregated, HMW oligomerized. Data are 
means + SDs. *P < 0.05; **P < 0.01; ****P < 0.0001; ns, nonsignificant. 
Statistics by one-way ANOVA. 


autophosphorylates after clustering into poly(GA) 
aggregates. 

We hypothesize that as long as poly(GA) in- 
clusions persist in diseased neurons, pTBK1 
clusters cannot disassemble. Because TBK1 
regulates the autophagic clearance of aggre- 
gates (18), its sequestration could increase 
poly(GA) inclusion levels. Replenishing TBK1 
could help alleviate this burden. When we over- 
expressed wild-type TBK1 in cells expressing 
GFP-(GA)99, We saw a significant increase in 
the phosphorylation of the autophagy adaptor 
p62, a known target of TBK1 (78), and a de- 
crease in poly(GA) aggregation in immunoblots 
(Fig. 2, G and H). This decrease was not ob- 
served when we overexpressed TBK1-R228H 
(Arg”"5_His) (Fig. 2, G and H), an ALS-associated 
partial loss-of-function mutant with reduced 
autophosphorylation activity (5). As expected, 
overexpression of wild-type TBK1 increased 
the ratio of pTBK1 to total TBK1 and the ratio of 
phosphorylated p62 to total p62, whereas TBK1- 
R228H only slightly increased these ratios (Fig. 2, 
Gand H). The TBK1-R228H mutation did not 
hinder its interaction with poly(GA) (Fig. 2F). 

If TBK1 is sequestered by poly(GA), fur- 
ther impairing TBK1 activity would be detri- 
mental to cells. Accordingly, mutations that 
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Fig. 3. Reducing TBK1 A 
activity exacerbates 
the behavioral, neuro- 
degenerative, and 
pathological effects 
of poly(GA) in 

mice. (A) Hindlimb 
clasping scores (N = 9, 
10, 16, and 14 mice 

at each time point). WT, 
wild-type. (B) Hanging 
wire test in 3-month-old 
mice (N = 9, 10, 16, 
and 14). (C) Whole- 
mouse brain weight 

(N = 12, 15, 16, and 14). 
(D and E) Images (D) 
and quantification (E) of 
NeuN in the mouse 
cortex (N = 11, 12, 13, 
and 12). Scale bar, 

200 um. (F and 
H) Images (F) and 
quantification (H) 

of pTBK1 in the mouse 
cortex (N = 14 each). 
Scale bar, 20 um. 
(G and I) Images (G) 
and quantification (|) of 
poly(GA) in the mouse 
cortex (N = 14 and 

16). Scale bar, 20 um. 
(J to L) Immuno- 
fluorescence (J) and 
quantification [(K) and 
(L)] in the mouse cortex 
(N = 6 each). Scale 
bar, 10 um. NT, non- 
transduced or poly(GA)- 
negative. Except in 

(A) and (B), mice were 
3.5 months old. Data 
are means + SDs. 

*P < 0.05; *P < 

0.01; ***P < 0.001; 
**P < 0.0001; ns, 
nonsignificant. Statis- 
tics by two-way ANOVA 
in (A), Student's t test 
in (H) to (I), and 
one-way ANOVA in 
remaining panels. 


) WT-GFP 


0 WT+GA)100-V5 


# pTBK1 inclusions/mm 


reduce TBK1 activity would exacerbate defects 
in poly(GA)-positive neurons—which explains 
why patients with mutations in both C9orf72 
and TBK1 show a more severe disease course 
(7). We modeled this double-mutant condition 
by using an established AAV-based protocol 
(10, 15) to express GFP or (GA);99-V5 in TBK1- 
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R228H knockin (5) and wild-type mice. Mice 
expressing (GA);99-V5 developed a progres- 
sive hindlimb clasping defect (Fig. 3A) and 
showed impaired performance in the rotarod 
(fig. S3A), hanging wire (Fig. 3B), and fear 
conditioning tests (fig. S3B). These pheno- 
types were enhanced in TBK1-R228H mice 


(Fig. 3, A and B, and fig. S3). Mice expressing 
GFP behaved normally (Fig. 3, A and B, and 
fig. S3), consistent with the TBK1-R228H model 
being phenotypically normal (5). 

Mice expressing (GA);99-V5 showed reduced 
total brain weight (Fig. 3C) and cortical neuronal 
loss (Fig. 3, D and E). Both phenotypes were 
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Fig. 4. Poly(GA) induces endosomal defects that trigger TDP-43 proteinopa- 
thy, and a reduction in TBK1 activity intensifies this process. (A and B) Images 
(A) and quantification (B) of early endosome antigen 1 (EEA1) in the mouse 
cortex (N = 6, 6, 13, and 13 mice). Scale bar, 20 um. (€ and D) Immunoflu- 
orescence (C) and quantification (D) in the mouse cortex (N = 6 each). Scale bar, 
10 um. (E and F) Images (E) and quantification (F) of pTDP-43 in the mouse 
cortex (N = 6, 6, 14, and 13). Scale bar, 20 um. (G and H) Immunofluorescence 
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(G) and quantification (H) in the mouse cortex (N = 6 each). Scale bar, 10 wm. 
(I to K) Immunofluorescence (I) and quantification [(J) and (K)] in HEK293T cells. 
Scale bar, 10 um. (L to N) Immunofluorescence (L) and quantification [(M) and 

(N)] in HEK293T cells + apilimod treatment (N = 3 independent experiments). Scale bar, 
10 um. All mice were 3.5 months old. Data are means + SDs. DMSO, dimethy! 
sulfoxide. *P < 0.05; ***P < 0.001; ****P < 0.0001; ns, nonsignificant. Statistics by 
Student's t test in (J), (K), (M), and (N) and by one-way ANOVA in remaining panels. 
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enhanced on the TBK1-R228H background 
(Fig. 3, D and E), where the cortex was no- 
ticeably thinner (Fig. 3D). Neuroinflammation 
levels often correlate with neurodegeneration 
in mice, and as such, expression of (GA)j99-V5 
increased the expression of the microglial 
markers Jba! (fig. S4A) and CD68 (fig. S4B) 
and the astroglial marker GFAP (fig. S4C). This 
was enhanced on the TBK1-R228H background 
(fig. S4, A to C). 

As in SOD1 models (3, 5), our double-mutant 
study is consistent with the idea that reduc- 
ing TBK1 activity in neurons cell-autonomously 
exacerbates FTD-ALS phenotypes. Loss of glial 
TBK1 may be beneficial (3, 5), however, because 
altered immune responses could modulate 
cOFTD-ALS (27). We next investigated whether 
TBK1-R228H affected TBK1-regulated immune 
response pathways (J8) in our mice. Poly(GA) 
increased the expression of two interferon- 
stimulated genes, IFITI (fig. S4D) and MX7 (fig. 
S4E), but their levels were comparable in wild- 
type and TBK1-R228H mice. Thus, the effects 
of TBK1-R228H on poly(GA)-induced defects 
may not involve the interferon pathway, but 
further experiments are required to confirm 
this hypothesis. 

We observed pTBK1 puncta in the cortex 
of both wild-type and TBK1-R228H mice ex- 
pressing (GA) 99-V5, although pTBK1 puncta 
were less abundant in the TBK1-R228H mice 
because of the mutants’ reduced autophos- 
phorylation activity (Fig. 3, F and H). Poly(GA) 
inclusions were increased in TBK1-R228H 
mice compared with wild-type mice (Fig. 3, 
G and I). These inclusions were intracellular 
and neuronal (fig. S4F). There was also a cor- 
responding increase in the number of puncta 
detected by a total TBK1 antibody in the TBK1- 
R228H mice (fig. S4, G and H). Total TBK1 and 
pTBK1 colocalized with poly(GA) aggregates 
in mice expressing (GA)j99-V5 (fig. S41 and 
Fig. 3, J to L). Thus, the R228H mutation 
did not prevent TBK1 sequestration but did 
reduce its activity, impairing its ability to 
induce aggregate clearance and increasing 
poly(GA) pathology. 

Loss of TBK1 impairs endosome matura- 
tion in human motor neurons (22). We asked 
whether TBK1 sequestration also impaired 
this pathway. Mice expressing (GA)199-V5 
developed enlarged early endosomes in the 
cortex (Fig. 4A and fig. S5A) that were more 
abundant on the TBK1-R228H background 
(Fig. 4, A and B). We also observed aberrant 
Rab7-positive late endosomes in mice express- 
ing (GA)o0-V5 (fig. $5, B to D). These endo- 
somal defects occurred in cells that contained 
poly(GA) aggregates (Fig. 4, C and D), which 
suggests a causal link. 

TBK1-mediated endosomal defects induce 
TDP-43 pathology in human motor neurons 
(22). Rare pTDP-43 inclusions are also observed 
in mice expressing poly(GA) (15). We explored 
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whether our double-mutant mice would show 
more pronounced pTDP-43 pathology. In fact, 
pTDP-43 inclusions were more abundant in 
mice expressing (GA);99-V5 on the TBK1-R228H 
background (Fig. 4, E and F). These inclusions 
occurred in poly(GA)-positive cells (Fig. 4, G and 
H), which suggests that poly(GA) aggregates, 
endosomal defects, and TDP-43 pathology are 
linked. 

To determine whether endosomal defects 
are sufficient to induce TDP-43 pathology, we 
used a constitutively active Rab5-Q79L mutant 
to induce enlarged early endosomes in cells 
(Fig. 4, I and J, and fig. S6, A to C). In a subset 
of these cells, exogenously expressed TDP-43, 
but not GFP, showed cytoplasmic aggregation 
and reduced nuclear localization (Fig. 4, I and 
K, and fig. S6A). We saw similar results when 
we used the phosphatidylinositol-3-phosphate 
5-kinase inhibitor apilimod to disrupt endo- 
some maturation (Fig. 4, L to N). Rab5-Q79L 
also induced endogenous TDP-43 aggrega- 
tion but not its nuclear depletion (fig. S6, B 
and D). Therefore, endosomal defects in- 
duce TDP-43 aggregation, but TDP-43 nu- 
clear depletion may be downstream of initial 
cytoplasmic seeding. Defects in nucleocytoplas- 
mic transport predate TDP-43 mislocalization 
in c9FTD-ALS (23), and TDP-43 aggregates se- 
quester nuclear pore components and related 
factors (24). 

We suggest a model in which poly(GA) in- 
clusions sequester TBK1, leading to a reduc- 
tion in TBK1 function that disrupts endosome 
maturation and induces TDP-43 aggregation. 
Although our work focuses on C9orf72 and 
TBK1, mutations in the multiple FTD-ALS genes 
affect endosomes or lysosomes in cells (25-28). 
Enlarged endosomes are also seen in ALS- 
like mouse models and sporadic ALS tissue 
(27, 29). Furthermore, filaments of lysosomal 
transmembrane protein 106B (TMEM106B) 
are a pathological hallmark in multiple neuro- 
degenerative diseases, including FTD-ALS (30-32), 
and variants in TMEMIO6B are associated with 
FTD (33). Several disease processes therefore 
converge on the endolysosomal pathway, the 
disruption of which could sensitize cells to 
pathogenic insults, like protein aggregation, and 
act as a primary driver of TDP-43 proteinopathy 
and neurodegeneration. 
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LIFE SCIENCE TECHNOLOGIES 


new products: microscopy-imaging 


Fixed-Focus Radiation Resistant Lenses 
For close-up inspection tasks in high-radiation 
environments, Resolve Optics offers both 

a standard range as well as customer- 
specified fixed-focus radiation resistant 
lenses. Manufactured to the highest-quality 
standards from cerium-doped glass, Resolve 
Optics’ range of fixed-focus non-browning 
lenses can withstand radiation exposure of up to 100 kilograys 
(kGy) (100,000,000 rads) and temperatures up to 55°C without 
discoloration. Starting from these proven fixed-focus lens designs, 
Resolve Optics is also able to quickly design and produce custom 
fixed-focus radiation resistant lenses fully optimized to yield the 
best results from your camera or sensor. 

Resolve Optics 

For info: +44-(0)-1494-777100 
www.resolveoptics.com/radiation-resistant-lenses-2 


Ultraviolet Dyes for Multiplex Flow Cytometry 

Bio-Rad Laboratories introduces three StarBright Dyes designed 

for use with a 355-nm UV laser in flow cytometry applications. 
StarBright UltraViolet 510, 665, and 795 Dyes offer exceptional 
brightness with narrow excitation and emission profiles for precise 
resolution, expanding Bio-Rad’s range to provide greater flexibility in 
multicolor flow cytometry panels. StarBright Dyes offer researchers 
unique fluorescent nanoparticles conjugated to highly validated flow 
antibodies, which are compatible with most experimental protocols 
and flow cytometers, including Bio-Rad’s ZE5 Cell Analyzer and 

S3e Cell Sorter. The dyes are resistant to photobleaching and are 
highly stable with no loss of signal in fixation; their minimal lot-to- 
lot variation helps to ensure consistent staining and deliver reliable 
results. 

Bio-Rad Laboratories 

For info: 1-800-424-6723 

www.bio-rad-antibodies.com 


All-in-One Microscope 

The APX100 system combines the ease of use of an all-in-one 
microscope with publication-quality images to make life science 
research more efficient. Designed for labs and imaging core facilities, 
the microscope features a small footprint, built-in anti-vibration 
mechanism, and light-shielded optics that enable it to be placed 
anywhere, even in a brightly lit room. Using the APX100 imaging 
system is simple—load the sample, close the lid, and press a button. 
The system then automatically acquires an overview macro image 
while the built-in Al software locates and displays the samples on 
the monitor, so the user can immediately begin capturing images. 
The system's autofocus is up to 12 times faster than conventional 
autofocus methods, enabling the user to quickly find the ideal 
imaging plane. APX100 is also equipped with a new gradient contrast 
method, which makes it possible to capture sharp, high-contrast 
images of live cells or thin, unstained tissue sections without the 
need for specialized differential interference contrast or phase- 
contrast optics. 

Olympus 

For info: 1-800-622-6372 
www.olympus-lifescience.com/en/solutions-based-systems/apx100 
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Slide Loader System 

Prior Scientific is pleased to introduce its newest automated slide 
loader, the SL160. This 160-slide capacity loader combines reliability, 
easy setup, and high capacity to provide automated slide loading 

to a wide variety of upright microscopes or with the use of Prior's 
OpenStand microscope platform. The SL160 allows for precise, safe 
loading of slides for the pathology, cytology, and screening markets. 
The system has few moving parts; can be set up in as little as 30 
min; and thanks to its several integrated sensing systems, handles 
your slides with the utmost care. By loading four slides at a time, 
the SL160 allows for much higher scanning throughput. The optional 
preview station with integral illuminator is a scan time-saver and can 
be used to predetermine areas of interest. Prior's ProScan system 
regulates the loader, scanning stage, and the optional OpenStand 
automated microscopy platform for a one-controller, multiaxis 
solution that simplifies software integration. 

Prior Scientific 

For info: 781-878-8442 

www.prior.com 


Universal UV-Visible Detector for Flow Chemistry 

The Flow-UV from Uniqsis is a universal inline UV-visible detector 
that sets a new standard for real-time monitoring of continuous flow 
applications. The compact, high-resolution Flow-UV charge-coupled 
device (CCD) array detector does not require calibration or routine 
servicing. In contrast to conventional deuterium UV lamps, the 
xenon flash lamp source used in the Flow-UV has a lifetime of up to 
10 years. Once you have set up and saved a method using the Flow- 
UV control software, pressing a single button is all that is required to 
start acquisition. To assure linearity of response, Flow-UV allows you 
to select up to five wavelengths over which to monitor a reaction, 
thereby ensuring detector saturation is avoided. Absorbance is 
plotted against time using the system control software. The control 
software may be configured to automatically record a background 
spectrum at the beginning of each experiment. 

Unigqsis 

For info: +44-(0)-845-864-7747 


www.uniqsis.com/paproductsdetail.aspx?ID=Flow-UV 


CMOS Camera 

Atik Cameras has launched ChemiMOS, the first in a series of high- 
resolution complementary metal oxide semiconductor (CMOS) 
cameras. This 9-megapixel camera, with setpoint cooling of -20°C, 
has been optimized for long exposures. Hours of exposure time 
have previously been available only with charge-coupled device 
(CCD) technology, but are now possible with CMOS technology 
thanks to the ChemiMOS zero-amp glow and low-noise design. The 
square format, K-grade sensor is guaranteed for continuous use, 
while the 3000 x 3000 resolution and 3.76-um pixel size are perfect 
for multiple scientific applications, including chemiluminescence, 
Western blotting, and gel documentation. Benefits include low read 
noise of <2 electrons (e-) rms and a pixel full-well capacity of >50,000 
e-, allowing for unprecedented dynamic range. Cooling is optimized 
to minimize the dark current to around 0.005 e-/p/s, without the 
need for extreme temperatures, allowing for superior end user 
design. 

Atik Cameras 

For info: +44-(0)-1603-740397 

www.atik-cameras.com 


Electronically submit your new product description or product literature information! Go to www.science.org/about/new-products-section for more information. 
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UNIVERSITAT 
HEIDELBERG 
ZUKUNFT 
SEIT 1386 


Heidelberg University is a comprehensive university with a strong 
research profile and international visibility that has been successful in 
every application round of the Excellence Competition of the German 
federal and state governments since 2007. With just under 

30,000 students and 8,400 employees, many of them top-level 
researchers, it is a globally renowned institution with outstanding 
economic significance for the Rhine-Neckar Metropolitan Region. 


Effective 1 October 2023, the university is looking for a new 


RECTOR (F/M/O) 


To fill this responsible and highly visible position, the university is looking 
for a personality with excellent academic credentials, strong leadership 
skills and an internationally distinguished profile who has acquired several 
years of management experience in a scientific-academic or university 
setting and demonstrates a firm understanding of the special qualities 
and needs of universities or other scientific-academic institutions. 
Applicants should be able to provide effective leadership for Heidelberg 
University, in cooperation with the University Council, the Senate and 
the University Administration, and to shape the university’s present and 
future in an innovative and integrative manner. This includes overseeing 
the ongoing development of a sustainable institutional strategy, based on 
Heidelberg’s self-definition as a comprehensive university, that can be 
embraced by the university as a whole. The new Rector (f/m/o) will be 
expected to carry on the university’s successful cooperation with other 
strong research universities and research alliances, and to further enhance 
its regional, national and international standing in order to ensure its 
continued success. 


The Rector (f/m/o) represents the university both internally and externally. 
The responsibilities of this position call for a strong personality with an 
excellent academic and personal reputation and outstanding leader- 
ship ability, decisiveness, commitment, social skills and communication 
skills. Visionary power should be accompanied by the ability to translate 
vision into action. 


The powers, conditions for appointment and employment status of 

he Rector (f/m/o) are set forth in § 17 of the Baden-Wurttemberg 
Higher Education Act (Landeshochschulgesetz). The Rector (f/m/o) is 
nominated by the Selection Committee and elected in a joint meeting 

of the University Council and Senate. If the legal requirements are met, 
he Minister-President of Baden-Wurttemberg will confer upon the 
Rector (f/m/o) the temporary status of a civil servant, unless the contract 
stipulates a fixed term of service. The term of office is six years, and 

he incumbent may be re-elected for a second term. The Rector (f/m/o) 
receives a W3 salary and is eligible for bonuses. 


Please submit your application by 31 October 2022 to the Chair of the 
University Council, Prof. Dr. Hanns-Peter Knaebel, Geschaftsstelle 
Universitatsrat, Seminarstr. 2, 69117 Heidelberg. If you would like to ap- 
ply by e-mail, please combine all application documents in a single PDF 


ile and send it to universitaetsrat@uni-heidelberg.de. We ask for your 
understanding that received application documents will not be returned. 


Heidelberg University stands for equal opportunities and diversity. 
Qualified female candidates are especially invited to apply. Persons with 
severe disabilities will be given preference if they are equally qualified. 
nformation on job advertisements and the collection of personal data is 
available at www.uni-heidelberg.de/en/job-market. 
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Global Scholar Recruitment Campaign 


City University of Hong Kong (CityU) is one 
of the world’s leading universities, known 
for innovation, creativity and research. We 
are now seeking exceptional scholars to 
join us as Assistant Professors/Associate 
Professors/Professors/Chair Professors (on 
substantiation-track) in all academic fields with special focuses on 
One Health, Digital Society, Smart City, Matter, Brain, and related 
interdisciplinary areas. Research fields of particular interest include, 
but not limited to: 


+ biomedical science and engineering 

* veterinary science 

* computer science and data science 

* neuroscience and neural engineering 

+ bio-statistics and Al-healthcare 

+ smart/semi-conductor manufacturing 

+ Al/robotics/autonomous systems 

* aerospace and microelectronics engineering 
* energy generation and storage 

+ digital business and innovation management 
+ fintech and business analytics 

* computational social sciences 

+ digital humanities 

+ digital and new media 

+ law and technology 

+ private law 

+ healthy, smart and sustainable cities 


Successful candidates should have a demonstrated ability to build a 
world-class research programme related to CityU's strategic research 
areas, plus a commitment to education and student mentorship. 
Candidates must possess a doctorate in their respective field by the 
time of appointment. 


Outstanding faculty joining the University will be considered for 
nomination of the Global STEM Professorship Scheme sponsored by 
the Government of the Hong Kong Special Administrative Region, 
and may be provided with subsidy for their research teams and for 
setting up laboratories. 


Please visit Colleges, Schools and Departments in CityU at 
https://www.cityu.edu.hk/academic/colleges-schools-and-departments 


City University of Hong Kong is an equal opportunity employer. We are committed to the 
principle of diversity. Personal data provided by applicants will be wsed for recruitment 
and other employment-related purposes. 


Worldwide recognition ranking #54 (5 2023), and #4 among top 50 universities under age 50 (OS 
202th #1 in the World's Most International Universities (THE 2020) #1 in Automation & 
ControvEilectrical & Electromic Engineering/Materias Science & Engineering/Metallurgico! 
Engineering Nanoscience & Nanotechnology and #3 in Telecommunication Engineering in Hang 
Kang (GRAS 2022); and #41 Business School in the World and #4 in Ania (UT Dallas 2017 to 2021) 
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UNIVERSITY 


is 


ROBERT E. MAYTAG CHAIR OF ORNITHOLOGY 
Associate Professor — Professor 


The University of Miami’s Department of Biology invites exceptional ornithologists 
to apply for the Robert E. Maytag Chair of Ornithology. In addition to being 
outstanding, internationally recognized scientists passionate about birds, applicants 
must be excellent teachers with strong commitments to undergraduate 

and graduate education. This prestigious chair includes an annual budget to support 
research. Applications will be considered at the Associate Professor and Professor 
ranks with an expected start date of August 15, 2023. 


We welcome applications from candidates who would enhance or complement our 
existing departmental programs in Biodiversity & Global Change, Tropical Ecology 
& Evolution, Development & Disease, Neuroscience & Behavior, and Microbiome 
Biology & Species Interactions. To be eligible for this tenure track appointment, 
candidates must hold a Ph.D. in Biology or a related field and have a strong record 
of research accomplishments and research funding. The successful candidate will 
be expected to maintain a vigorous, externally funded research program, to teach 
at both the undergraduate and graduate level, including regularly teaching an 
undergraduate course in ornithology and be committed to professional engagement 
that promotes diversity, equity, and inclusion. 


Interested applicants should submit a cover letter describing the interactions they 
foresee with existing research programs in the Department of Biology or other units 
at the University of Miami, a curriculum vitae and, a 1-2 page statement describing 
their commitment to increasing diversity, equity, and inclusion through research, 
teaching, and service. 


For reference, information about the University of Miami’s pursuit of Racial Justice 
can be found here: https://president.miami.edu/inclusion/index.html. 


Application documents should be submitted as a single pdf file online via the UM 
Careers website: https://umiami.wd1.myworkdayjobs.com/UMFaculty. 


Following initial review of applications, long list candidates will be contacted by 
email and requested to solicit three letters of recommendation, including one from 
a former mentee. 


To receive full attention, application materials must be received by October 15th, 
2022. More information about the Department and University can be found at 
https://www.biology.as.miami.edu. Inquiries should be directed to the Search Chair 
at: maytag.chair.search@miami.edu. 


RUTGERS 


ASSISTANT PROFESSOR —- TENURE-TRACK FACULTY 


The Department of Cell Biology and Neuroscience (CBN) at Rutgers University 
invites applications for a tenure-track faculty position at the Assistant Professor 
level to develop an innovative research program focused on the role of non- 
neuronal cells in the function of the central nervous system. Research approaches 
may involve a variety of animal or cellular models and may be directly or indirectly 
relevant to human diseases. 


The CBN Department is part of the School of Arts and Sciences (SAS) and is 
based on the New Brunswick/Piscataway campus, located within one hour of 
New York City and Philadelphia. The CBN Department is home to a collegial 
faculty with broad research interests, including molecular and cellular biology, 
neurodevelopment, immunology, and system neuroscience. Opportunities for 
interdisciplinary collaboration exist within SAS Departments and Institutes, the 
Rutgers Robert Wood Johnson Medical School, as well as the Rutgers University 
Newark campus and Princeton University. Rutgers University offers excellent 
facilities, faculty mentoring, and competitive start-up packages. 


Applicants must hold a Ph.D., M.D., or equivalent degree and have a minimum of 
three years of postdoctoral training in a relevant field. The successful candidate 
is expected to maintain a productive, extramurally funded research program, to 
train pre- and postdoctoral fellows, and to teach undergraduate courses for the 
CBN major. 


Rutgers University has one of the most diverse student bodies in the nation and 
boasts several programs established to nurture equity and inclusion. We encourage 
applications from women and members of communities who are underrepresented 
in the sciences and will evaluate the potential of the applicant to mentor and 
empower our students. 


Interested individuals are encouraged to apply by supplying the following: 1) a 
curriculum vitae; 2) a brief statement of research plans; 3) a statement summarizing 
their approach to promoting diversity and inclusion; 4) a statement describing 
teaching and mentoring interests and experience; 5) the contact information of three 
individual who can provide letters of reference. 


Applicants should submit no later than December 15, 2022 at https://jobs.rutgers. 
edu/postings/179846. Late applications will be considered only if the position 


remains available. 


Rutgers University is an equal opportunity/affirmative action employer. 


Rl ITGERS 


ASSISTANT PROFESSOR POSITION IN GENETICS 


The Department of Genetics in the School of Arts and Sciences at 
Rutgers, The State University of New Jersey, invites applications for an 
outstanding tenure-track Assistant Professor who studies how genetic 
and epigenetic variation affects metabolic pathways and how metabolic 
feedback can alter genomic function. Applicants who apply these 
approaches to cancer are especially encouraged to apply. Outstanding 
researchers in other areas that complement our existing strengths will 
also be considered. 


New faculty may leverage partnerships with the Cancer Metabolism and 
Immunology Program at the Cancer Institute of New Jersey, the Institute 
for Food Nutrition, and Health, the Environmental and Occupational 
Health Sciences Institute, and others in our vibrant and interactive 
research community, topping $872 million in yearly research funding 
(FY2022). Rutgers Life and Biomedical Sciences includes over 200 
faculty members across multiple outstanding departments and institutes. 
See http://genetics.rutgers.edu for more information. 


Rutgers University hosts one of the most diverse student bodies in the 
United States. We are committed to diversity, equity, and inclusion, and we 
especially encourage applications from groups underrepresented in STEM. 


Candidates will have a Ph.D. in genetics or a closely related field and/ 
or M.D., a record of significant research, and be expected to teach in 
cancer, genetics, or informatics. Applicants should submit a detailed CV, 
a 2-3 page research and teaching statement, contact information for three 
professional references, and a separate statement describing how your 
research, teaching, and service will contribute to Rutgers’ commitment 
to enhancing diversity and inclusiveness. Review of applications will 
begin on October 15th and continue until the position is filled. Submit at 
https://jobs.rutgers.edu/postings/179660. 


All qualified applicants will receive consideration for employment without 
regard to race, color, religion, sex, sexual orientation, gender identity or 
expression, national origin, disability or protected veteran status. 


RUTGERS 


DEPARTMENT OF MOLECULAR BIOLOGY AND BIOCHEMISTRY 


The Center for Advanced Biotechnology and Medicine (CABM) and the Department 
of Molecular Biology and Biochemistry (MBB) at Rutgers, The State University of 
New Jersey, invite applications for a tenure-track Assistant Professor position. The 
position requires a PhD in molecular biology or a related field by the appointment 
date (9/1/2023). The successful candidate will be responsible for teaching 
undergraduate and graduate students, and leading a creative research program in 
the general areas of Biochemistry or Molecular Biology. Applicants who study 
metabolism in health and disease using biochemical, metabolomic, and genetic 
approaches are especially encouraged to apply. Outstanding researchers in other 
areas will also be considered. The research program of the new faculty member will 
complement existing strengths in the study of cancer, aging and gene regulation. 


The successful candidate will be a full member of both CABM, which will 
administer their extramural research program, and the MBB Department, their 
tenure home. CABM is an interdisciplinary scientific institute on the Rutgers Busch 
Campus, within one hour of New York and Philadelphia, in a highly desirable area. 
Outstanding local opportunities exist for scientific collaboration, and the appointed 
candidate will additionally benefit from access to graduate students from our strong 
interdepartmental programs in molecular biosciences. 


Rutgers University has one of the most diverse student bodies among institutions of 
higher learning in the country, and boasts several programs established to nurture 
equity and inclusion that enhances the achievement of excellence by faculty 
members and students. We encourage applications from women and members 
of communities who are underrepresented in the sciences, and will evaluate the 
potential of the applicant to mentor and empower minoritized students. 


The position is highly competitive with regard to start-up funds, laboratory space 
and salary. Interested individuals should apply online with the following materials: 


+ 1-page cover letter 

* Curriculum vitae 

* 2-3 page research plan 

* Contact information for three scientists who can provide letters of reference 

* 1-page statement discussing your past and/or potential contributions to diversity 


Link to apply: https://jobs.rutgers.edu/postings/181348 
Applications should be submitted by December 1, 2022. 


Rutgers University is an equal opportunity/affirmative action employer. 
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Brain Tumor Basic Science Faculty Position in the Department of Neurosurgery 
And Huntsman Cancer Institute, University of Utah 
An NCI-Designated Comprehensive Cancer Center 


The Department of Neurosurgery and the Huntsman Cancer Institute are seeking an exceptional Basic Science/Early Translational candidate with a focused 
interest in brain tumors to join our Faculty at the Assistant or Associate Professor level (tenure track). The selected individual will be an accomplished 
academic with a track record of significant research funding and with a history of successful collaboration. Their scientific program will be housed within 
HCI and will provide the opportunity to conduct collaborative work among the neuro-oncology community at HCI and the University of Utah in concert with 
the clinician scientists within the larger department of Neurosurgery. These combined entities offer a robust research environment, and the candidate will be 
expected to demonstrate a strong commitment to national and international collaborative research and to initiate and help facilitate interdisciplinary projects 
within our Center for Neurologic Cancers. 


We are seeking a junior or mid-career investigator with innovative basic or translational cancer biology programs that emphasize mechanistic approaches to the 
understanding and treatment of primary and/or metastatic brain tumors. Areas of interest include signal transduction, stem cells, gene regulation/transcription, 
chromatin/epigenetics, genome stability/DNA repair, cancer metabolism, cancer genetics, metastasis, tumor immunology, pediatric/youth cancers, target 
validation, drug discovery/validation, epigenetics & gene expression, DNA damage & repair, cancer initiating cells, and mechanisms of therapy resistance. 


HCI and the University Health Sciences Center provide access to state-of-the-art equipment and services through exceptional Core Facilities (see www.cores.utah.edu) 
that enhance both discovery and translational science. HCI also offers state-of-the-art laboratories including a new 225,000 sq. ft. research building. For more 


information about HCI visit www.huntsmancancer.org. 


Applicants for Assistant Professor are expected to hold a PhD or MD/PhD (or equivalent), to have received appropriate postdoctoral training and to have a 
track record of research impact and productivity. Applicants at the Associate Professor level should additionally have a proven record of independent funding 
and innovative research. HCI particularly encourages and welcomes applications from physician-scientists. Highly competitive recruitment packages are 
available with appointment and rank in the Department of Neurosurgery at the University of Utah coupled with an appointment as an HCI Investigator. 


Candidates should submit a curriculum vitae, a cover letter containing a description of professional experience [including scientific accomplishments, 
leadership responsibilities (for senior candidates) and 3 references], and a 3-page research plan. The position will remain open until filled, but applications 
received before December 12, 2022 date will receive priority for review. 


To apply online, please visit the following link: http://utah.peopleadmin.com/postings/110855 
Or, send to: Huntsman Cancer Institute | Attn: Recruitment Office 
2000 Circle of Hope, Salt Lake City, UT 84112-5550 
Email: hci.recruitment@hci.utah.edu 


The University of Utah is an Affirmative Action/Equal Opportunity employer and does not discriminate based upon race, national origin, color, religion, sex, age, 
sexual orientation, gender identity/expression, status as a person with a disability, genetic information, or Protected Veteran status. 


UTSouthwestern 5 


Medical Center 


Shenzhen Institute of 
Advanced Technology . 


TENURE-TRACK POSITION —- DEPARTMENT OF PHYSIOLOGY T ] | ] Chi pear Sci 
inese ACagemy OF sciences 


THE UNIVERSITY OF TEXAS SOUTHWESTERN MEDICAL CENTER 


The Department of Physiology invites outstanding scientists with Ph.D., M.D., 
or equivalent degrees to apply for tenure-track faculty positions at the level of 
Assistant Professor. Candidates who bring innovative approaches to the study 
of any under-explored/unexplored questions broadly related to physiology 
are encouraged to apply. The scientific excellence of the candidates is more 
important than the specific area of research. These positions are part of the 
continuing growth of the Department at one of the country’s leading academic 
medical centers. They will be supported by significant laboratory space, 
competitive salaries, state-of-the-art core facilities and exceptional start-up 
packages. The University of Texas Southwestern Medical Center is the scientific 
home to six Nobel Prize laureates since 1985, 25 members of the National 


Established in partnership between the Chinese Acade- 
my of Sciences and the Shenzhen Municipal Govern- 
ment, the Shenzhen Institute of Advanced Technology 
(SIAT) is a newly-created university with an objective to 
become the world's preeminent institute for emerging 
science and engineering programs. SIAT is equipped 
with state-of-art teaching and research facilities and is 
dedicated to cultivating international, visionary, and in- 
terdisciplinary talents while delivering research support 
to pursue innovation-driven development. 


Academy of Sciences, and 17 members of the National Academy of Medicine. 
UT Southwestern conducts more than 5,800 research projects annually 
totaling more than $554 million. Additional information about the Department 
of Physiology can be found at http://www.utsouthwestern.edu/education/ 


medical-school/departments/physiology/index.html. 


Applicants should submit a CV, a brief statement of current and proposed 
research, and a summary of your two most significant publications describing 
the importance of the work (100-150 words each). Please arrange to have 
three letters of recommendation sent on his/her behalf. All items should 
be submitted to: http://academicjobsonline.org/ajo/jobs/22277. Completed 
applications will be reviewed starting November 1, 2022. You may email 
questions to ron.doris@utsouthwestern.edu. 


UT Southwestern Medical Center is committed to an educational and working 
environment that provides equal opportunity to all members of the University 
community. In accordance with federal and state law, the University prohibits 
unlawful discrimination, including harassment, on the basis of: race; color; 
religion; national origin; sex, including sexual harassment; age; disability; 
genetic information; citizenship status; and protected veteran status. In 
addition, it is UT Southwestern policy to prohibit discrimination on the basis 
of sexual orientation, gender identity, or gender expression. 


SIAT is located in Shenzhen, also known as the "Silicon 
Valley of China,” a modern, clean, and green city, 
well-known for its stunning architecture, vibrant econo- 
my, and its status as a leading global technology hub. 
SIAT is seeking applications for faculty positions of all 
ranks in the following academic programs: Computer 
Science and Engineering, Bioinformatics, Robotics, 
Life Sciences, Material Science and Engineering, Bio- 
medical Engineering, Pharmaceutical Sciences, Syn- 
thetic Biology, Neurosciences, etc. SIAT seeks individ- 
uals with a strong record of scholarship who possess 
the ability to develop and lead high-quality teaching 
and research programs. SIAT offers a comprehensive 
benefits package and is committed to faculty success 
throughout the academic career trajectory, providing 
support for ambitious and world-class research proj- 
ects and innovative, interactive teaching methods. 


Further information: 


online @sciencecareers.org 
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GROUP LEADER IN NANOMEDICINE 


The Nanoscience Cooperative Research Center CIC nanoGUNE 
(www.nanogune.eu), created with the mission of conducting world-class 
nanoscience research for the competitive growth of the Basque Country, 
member of the Basque Research and Technology Alliance (BRTA), and 
recognized by the Spanish Research Agency as a “Maria de Maeztu” center 
of excellence for the period 2022-2025, is currently looking for a Group 
Leader in Nanomedicine. 


We invite applications for a group-leader position in the field of 
nanomedicine, with a focus on medical diagnostics, therapeutics, and/or 
device technology, including biomedical engineering. Of particular interest 
are research areas with connection to technology and with applications of 
interest for the medical community. A thematic overlap with at least some 
of nanoGUNE’s already existing research groups would be desirable. 


Candidates should have (i) an outstanding track record of impactful work 
in nanomedicine and (ii) the potential to attract public and private funding 
at the highest level. A track record of successful collaborative research with 
companies and/or medical practitioners will be highly valued. Proficency 
in spoken and written English is compulsory. 


In addition to a start-up package, we offer an international and competitive 
environment, an excellently equipped state-of-the-art infrastructure, 
and the possibility to perform top-class research. We promote teamwork 
in a diverse and inclusive environment, and we welcome applicants 
without regard to disability, gender, nationality, race, religion, or sexual 
orientation. Female candidates are particularly encouraged to apply. 


Candidates should forward their CV, a summary of research interests, and 
a list of at least three references through the form that is available at www. 
nanogune.eu/careers. The selected candidate will be offered a position as 
Ikerbasque Research Associate or Ikerbasque Research Professor. 


Closing Date: 31 October 2022 


Science Careers helps you advance 
your career. Learn how ! 


= Register for a free online account on ScienceCareers.org. 
= Search hundreds of job postings and find your perfect job. 


= Sign up to receive e-mail alerts about job postings that 
match your criteria. 


= Upload your resume into our database and connect 
with employers. 


= Watch one of our many webinars on different career topics 
such as job searching, networking, and more. 


Visit ScienceCareers.org 
today — all resources are free 
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= Q Find your next job at ScienceCareers.org 


___Whether you’re looking to get ahead, get into, or just plain get 


advice about careers in science, there’s no better or more trusted 
“authority. Get the scoop, stay in the loop with Science Careers. 
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ASSOCIATE OR FULL PROFESSOR 
Neuroscience of Audition, Speech and Language 


CHARGE: The successful candidate is expected to undertake research in the area of audition, speech and language at the highest 
national and international levels. In this context he/she will use methods of human neuroscience (e.g. MRI, MEG, EEG, TMS, 
NIRS, Intra-cortical recordings, VR, BCD, publish in high impact journals and secure national and international funding. 


The candidate is also expected to assume the co-direction of the National Center for Competence in Research (NCCR) “Evolving 
Language” moving forward this large, interdisciplinary effort across Swiss institutions. Administrative and organizational duties 
within the Department of Fundamental Neuroscience and the Faculty of Medicine are also expected in addition to the involvement 
in the NCCR. 


Teaching at the undergraduate and postgraduate levels in the field of the position as well as supervising Masters’ and doctoral theses 
are expected. 


REQUIREMENTS: 
- MD, MD-PhD or PhD (or equivalent degree) in a field relevant to the position. - Capacity for interdisciplinary collaboration. 
- Proven experience directing a research team and teaching. - Ability to acquire a good knowledge of French 


- Track record of publications in leading international journals. within the first two years. 
- Proven track record in securing competitive funding. 


STARTING DATE: 01.10.2023 or according to agreement. 


Mandatory online registration before November 30th, 2022 at: http://www.unige.ch/academ 


Additional information may be obtained from: Viviane.Burghardt@unige.ch 


Women are encouraged to apply 
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foryouyiy yam INSTRUCTIONAL PROFESSOR IN BIOLOGY 


Description: The Biological Sciences Collegiate Division at the University of Chicago is accepting 
applications for an open rankInstructional Professor position. This is a full-time teaching position 
beginning on or after January 1, 2023. The appointment will be for an initial term of at least two years 
with reappointment and progression possible following review. 


‘ Responsibilities include teaching courses, some designed for majors and some for non-majors, in the areas 
. of general biology, genetics, biochemistry, computational biology, biotechnology, and/or physiology. 
There's only one Science Most courses include both lecture and lab components, and some are taught in partnership with university 
faculty. The Instructional Professor must therefore be able to teach a course independently and to function 
well as a member of a teaching team. Additional duties include hiring, training, and supervising teaching 
assistants. Instructional Professors of all ranks are required to engage in regular professional development. 
Features in myIDP include: 
3 ' Qualifications: Applicants must have a PhD in the biological sciences in hand prior to start date and at 
Exercises to help you examine your least one year of teaching university level biology lecture and laboratory courses. 
skills, interests, and values. 


. bes yee Applications Instructions: Applicants must apply online at the University of Chicago’s Interfolio 
Alist of 20 scientific career paths website at apply.interfolio.com/113963. 
with a prediction of which ones best ca are 


fit your skills and interests. 


The required application materials are: 1) a cover letter; 2) a curriculum vitae; 3) a teaching statement; 4) 
a sample course syllabus; 5) course evaluations or evidence of past teaching performance; 6) the names 
and contact information for three references. 


Visit the website and start Review of applications will begin on October 26, 2022 and will continue until the position is filled or 


mye planning today! the search is closed. 


myIDP.sciencecareers.org. Equal Employment Opportunity Statement: We seek a diverse pool of applicants who wish to join 


an academic community that places the highest value on rigorous inquiry and encourages diverse 
perspectives, experiences, groups of individuals, and ideas to inform and stimulate intellectual challenge, 


Science ee engagement, and exchange. The University’s Statements on Diversity are at https://provost.uchicago. 
— Careers In partnership with: —— edu/statements-diversity. 
BYAAAS 
The University of Chicago is an Affirmative Action/Equal Opportunity/Disabled/Veterans Employer 
and does not discriminate on the basis of race, color, religion, sex, sexual orientation, gender identity, 
rriierare national or ethnic origin, age, status as an individual with a disability, protected veteran status, genetic 
University of California FUNDS information, or other protected classes under the law. For additional information please see the 
See University 8 Notice of Nondiscrimination. 
DA v5 or PAN FASEB Job seekers in need of a reasonable accommodation to complete the application process 
Ce ee Massachusetts we gate ten Saves should call 773-834-3988 or email equalopportunity@uchicago.edu with their request. 
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By Gianfranco Matrone 


106 


Moving past my stutter 


could not have been better prepared for the talk, my first conference presentation as a Ph.D. 
student. I had learned my speech by heart, and I had practiced the answers to potential audience 
questions. The day before the talk, alone in my room, I felt confident. But as soon as the moderator 
invited me on stage, I felt like I was walking the plank. I was overwhelmed by a fear I had experi- 
enced too many times before—the fear of getting stuck because of my stutter. 


It began when I was a child. The doc- 
tor told my parents stuttering was not 
uncommon in children and it would 
go away on its own as I got older. But 
I felt ashamed and spoke as little as 
possible at school. I focused on math 
and science; in these subjects, I could 
satisfy my teachers with written ex- 
ams and reports rather than spoken 
ones. I enjoyed these subjects, too, 
and performed well. 

But by the time I got to middle 
school, my stutter had made me a 
target for relentless bullying, which 
hurt my academics. So, I abandoned 
traditional studies and went to 
cooking school, where I hoped my 
culinary work would speak for itself. 
I saw my future as a chef—until a 
nutrition class, when the teacher’s 
discussions of protein and nucleic 
acid structure, the Krebs cycle, and 
other “delicacies” reminded me of 
my love of science. I was surprised to find that my thirst to 
learn more biology was stronger than my fear of stuttering 
during oral exams. I decided to change my life plan and go 
to university to study biology. 

Looking for a strategy to tackle my stutter, I realized I 
could speak smoothly when I recited text I had learned by 
heart. But this wasn’t a realistic approach for the entirety of 
my university training. I needed another tool. I tried speech 
therapy, but the exercises I learned there didn’t make much 
of a difference. What ultimately helped was following the 
predictable rhythm of a metronome during my oral ex- 
ams, which helped me speak smoothly while using my own 
words rather than memorizing. Eventually I graduated, the 
first of my family, and was inspired to continue in science. 

I started to work in a research lab, with the ambition of 
moving on to a Ph.D., but communication proved a bar- 
rier. During discussions I was constantly on alert for words 
I knew Id be stuck on and thinking about how I could re- 
place them with more “suitable” words, which took my focus 
away from my scientific thoughts. As a result, I came across 


“|was proud of how 
| had overcome my fears 
to share my work.” 


as distracted and unintelligent. I also 
failed several interviews for a Ph.D. 
studentship, maybe for the same 
reasons. To stop my stutter from suf- 
focating my fledgling career, I would 
need a more drastic intervention. 

After much hesitation, I decided 
to finally try psychoanalysis to con- 
front my deep feelings of shame 
and trauma from bullying related 
to my stutter. With the help of my 
therapist, I slowly realized the real 
problem was not the stutter itself; it 
was my fear of stuttering. I resolved 
to get rid of that fear. I took about 
a year to spit out during one of the 
weekly sessions that I love myself as 
I am. Stutter or not, I was going to 
pursue my desired career in science. 

My first chance to test my new- 
found confidence by speaking in 
public arrived at a departmen- 
tal meeting at the institute where 
I was a research assistant. I presented my data to 25 or 
30 people—and received compliments for my speech for the 
very first time. I returned home excited, relieved, and, most 
of all, proud. 

The next year, I was awarded a Ph.D. studentship and 
faced my biggest test yet: that conference talk. My anxi- 
ety was nearly overwhelming, but I held on to what I had 
learned in therapy. I focused on the value of what I was say- 
ing rather than whether I was saying it perfectly smoothly. 
I ended up stuttering a few times, but each time I let it go 
and kept moving forward. Afterward I was proud of how I 
had overcome my fears to share my work. I got some posi- 
tive feedback, too, which raised my confidence even more. 

Now, I no longer avoid public speaking; instead, I ac- 
tively seek opportunities to be on the stage. It is rewarding 
and invigorating, and I feel excited to have a good story to 
share. And if I happen to stutter along the way, so be it. 


Gianfranco Matrone is a researcher at the University of Edinburgh. 
Send your career story to SciCareerEditor@aaas.org. 
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ILLUSTRATION: ROBERT NEUBECKER 


Recognizing the v of ly career scientist WMS. 
performed outstanding research in the field of cancer. Award 
nominees must have received their Ph.D. or M.D. within the last 


10 years. The winner will deliver a public lecture on their research, 
AAAS MARTIN AND receive a cash award of $25,000, and publish a Focus article on 

their award-winning research in Science Translational Medicine. 
ROSE WACHTEL 


CANCER RESEARCH For more information visit 
www.aaas.org/aboutaaas/awards/wachtel 


AWA Re D or e-mail wachtelprize@aaas.org. 
Deadline for submission: February 1, 2023. 


MVAAAS — Science Translational Medicine 


3 OPEN ACCESS 


GOLD OPEN ACCESS, DIGITAL, AND FREE TO ALL READERS 


